AnalyticsDojo

Introduction to Python - Numpy

rpi.analyticsdojo.com

Overview of Numpy

  • Numpy is a package that provides additional functionality often useful working with arrays for data science.
  • Typically Numpy is imported as np.
  • np.array() will cast a list (or other collection) as a numpy array.
  • You can slice an array in the same way yo can slice a list.

```import numpy as np a = np.array([0, 1, 2, 3, 4, 5, 6]) print(‘A is of type:’, type(a)) print(‘Print the entire array:’, a) print(‘Print the first value:’, a[0]) print(‘Print the first three value:’, a[0:3]) print(‘Print from second value till end of list:’, a[2:]) print(‘Print the last value of a numpy array:’, a[-1]) print(‘Print up till the 2nd to last value:’, a[:-2])

</div>

</div>



## Arrays and Functions
- A really powerful aspect of arrays is the capaiblity to do calculations over arrays.
- Numpy has a number of functions possible listed [here](http://docs.scipy.org/doc/numpy/reference/routines.math.html).
- Often it is possible to do calculations directly or via np functions, as shown below. 



<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```import numpy as np
a = np.array([1, 2, 3, 4, 5, 6])
b1=10*a
b2=np.multiply(10,a)
c1=a+b1
c2=np.add(a,b1) #This is an alternate way of adding 
d=np.log(a)
e=np.sqrt(a)
f=a**2  #This squares the value. 

np.square([-1j, 1])
print('Print the entire array a:', a)
print('Print the entire array b1:', b1)
print('Print the entire array b2:', b2)
print('Print the entire array b3:', c1)
print('Print the entire array c2:', c2)
print('Print the entire array d:', d)
print('Print the entire array e:', e)
print('Print the entire array f:', f)

Creating and Manipulating Numpy Arrays

  • The arrange function will generate an array.
  • Reshape changes the structure of the array to n rows and m columns. a=a.reshape(n, m) -ones will create an array with all ones and zeros with all zeros.
  • Reshaping can get it in the appropriate structure, but make sure that the size fits the appropriate dimensions.

```import numpy as np a = np.arange(15) print(a) a2 = np.arange( 0, 15, 1 ) #Alternate specification with np.arrange(start, end, step) print(a2) a=a.reshape(3, 5) print(a) b= np.ones(shape=(3, 5), dtype=float) print(b) c= np.zeros(shape=(3, 5), dtype=int) print(c) d= np.full((3, 5), 4, dtype=int) print(d) e= np.arange( 0, 1.5, .1 ).reshape(3,5) #String together creations and reshaping. Also can use decimals. print(e)

</div>

</div>



<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```e= np.arange( 0, 1.5, .1 ).reshape(3,5) 

Generating Random Numpy Data

  • This is often useful, and we will be using it to demonstrate some initial techniques.
  • Often you want random but repeatable results, so that for example a test could have a consistent average on a random array. For this we need to set a seed. You only have to do this once.
np.random.seed([2335])
a = np.random.uniform(50, 150, 10)  #Between 50-150, generate 10 variables from uniform
b = np.random.standard_normal(10)   #With mean 0 and standard deviation 1 
print(a)
print(b)



Combining Numpy Arrays

  • concatenate will string a list of numpy arrays together np.concatenate([a,b])
  • vstack will stack numpy arrays
  • Defaults: start =0, end =last and step is 1.
  • To print the entire array, leave start/stop/step blanka[::]

```a = np.arange(5) b=np.concatenate([a,a]) c=np.vstack([a,a]) d=np.hstack([c,c]) print(‘a:’,a,’\nb:’,b,’\nc:’,c,’\nd:’,d)

</div>

</div>



## Slicing Single Dimension Numpy Arrays
- Slicing arrays includes  three numbers `a[start:stop:step]` but not all are required.
- Defaults: start =0, end =last and step is 1.
- To print the entire array, leave start/stop/step blank`a[::]`




<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```e= np.arange( 0, 15, 1 ) 
print(e)
#[start:end:step]


print("This is the start, end, and step:",e[2:9:3]) 
print("Print every other:",e[::2]) 
print("Print starting at 2 and ending at 9, default step 1:",e[2:9]) 
print("Print all:",e[::])
print("Print all:",e[:]) 
print("Print all:",e) 

Numpy Arrays From External Datasets

  • We can take a list from an external dataset and change it to an numpy array.

```#First let’s download some data. !wget https://raw.githubusercontent.com/rpi-techfundamentals/spring2019-materials/master/input/iris.csv

</div>

</div>



<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```import csv
csv_file_object = csv.reader(open('iris.csv', newline=''), delimiter=',')

data=[]
header = next(csv_file_object) #
for row in csv_file_object:  
    data.append(row)  # add each row to the 
data = np.array(data)
print(data)

Slicing 2 Dimensional Numpy Arrays

  • We can slice arrays with array[row, column] were row and column each include the (start:stop:step) like in arrays
  • We can sepecify the type with the .astype(np.float_)
  • For a full list of Numpy types, see documentation
  • If we create a one dimensional array from 2 dimensional numpy array, it will also be a numpy array of same type.

```#We can slice the array several different ways and generate new variables.

irisdata=data[0::,0:4:].astype(np.float_) #This will select only the first 4 columns and change the type to float irisdata=data[:,0:4].astype(np.float_) iristype=data[0::,4:5:] # This will select only the type. print(irisdata,’\n’,iristype)

</div>

</div>



<div markdown="1" class="cell code_cell">
<div class="input_area" markdown="1">
```#This can be used to select column 1 and assign to new variable. 
#This will sum up column 1
newvariable=irisdata[::,0:1:]

#This will sum up column 0
final=irisdata[::,0:1:].sum()

type(newvariable)
#print(newvariable)
print(final)

```#This will take the mean of column 1 print(‘mean:’, irisdata[::,0:1:].mean())

```

CREDITS

Copyright AnalyticsDojo 2016 This work is licensed under the Creative Commons Attribution 4.0 International license agreement.