Intro to Numpy

Intro to Numpy

#100daysofpython, #100daysofdatascience, #100daysofdataanalysis

Effective data-driven science and computation requires understanding how data is stored and manipulated. This section outlines how numpy improves the way python handles arrays of data.

Before we dive into more on numpy, let's notice the difference between python lists and numpy. The standard python implementation is written in C. This means that every python object is simply a cleverly disguised C structure, which contains not only its value, but other information as well. For example, when we define a integer in python, such as y= 1, y is not just a "raw" integer. It's actually a pointer to a compound C structure, which contains several values. Because python's dynamic typing, we can even create heterogeneous lists, which makes python more flexible. But this flexibility comes at a cost: to allow these flexible types, each item in the list must contain its own type info, reference count, and other information-that is, each item is a complete python object. It can be much more efficient to store data in a fixed-type array. Which is numpy arrays. At the implementation level, the array essentially contains a single pointer to one contiguous block of data. As we have discussed, the python list, on the other hand, contains a pointer to a block of pointers, each of which in turn points to a full python object like the python integer we saw earlier. The advantage of list is flexibility: because each list element is a full structure containing both data and type information, the list can be filled with data of any desired type. Fixed-type Numpy-style arrays lack this flexibility, but are much more efficient for storing and manipulating data.

Since we have enough information on the difference and how each is allocated, the python list and python arrays which are essential to numpy. Let's continue our intro to numpy.

Creating Arrays from Python Lists

Now we creating arrays from python lists using np.array.

    In[1]: import numpy as np
    In[2]: # integer array:
           np.array([1, 4, 2, 5, 3])
    Out[7]: array([1, 4, 2, 5, 3])

Remember that unlike python lists, Numpy is constrained to arrays that all contain the same type. if types do not match, Numpy will upcast if possible (here, integers are upcast to floating point):

In[9]: np.array([3.14, 4, 2, 3])
Out[9]: array([ 3.14, 4. , 2. , 3. ])

If you want to explicitly set the data type of the resulting array, we can use the 'dtype' keyword:

In[10]: np.array([1, 2, 3, 4], dtype='float32')
Out[10]: array([ 1.,  2.,  3.,  4.], dtype=float32)

Finally, unlike Python lists, Numpy arrays can explicitly be multidimensional; here's one way of initializing a multidimensional array using a list of lists:

    In[11]: # nested lists result in multidimensional arrays 
            np.array([range(i, i + 3) for i in [2, 4, 6]])
    Out[11]: array([[2, 3, 4],
                    [4, 5, 6],
                    [6, 7, 8]])

The inner lists are treated as rows of the resulting two-dimensional array.

Creating Arrays from Scratch

Especially for larger arrays, it is more efficient to create arrays from scratch using routines built into Numpy. Here are several examples:

In[12]: # Create a length-10 integer array filled with zeros np.zeros(10, dtype=int)    
Out[12]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
In[13]: # Create a 3x5 floating-point array filled with 1s np.ones((3, 5), dtype=float)    
Out[13]: array([[ 1.,  1.,  1.,  1.,  1.],
                    [ 1.,  1.,  1.,  1.,  1.],
                    [ 1.,  1.,  1.,  1.,  1.]])
In[14]: # Create a 3x5 array filled with 3.14 np.full((3, 5), 3.14)
Out[14]: array([[ 3.14,  3.14,  3.14,  3.14,  3.14],
                [ 3.14,  3.14,  3.14,  3.14,  3.14],
                [ 3.14,  3.14,  3.14,  3.14,  3.14]])
In[15]: # Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function) np.arange(0, 20, 2)
Out[15]: array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
In[16]: # Create an array of five values evenly spaced between 0 and 1 np.linspace(0, 1, 5)
Out[16]: array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])
In[17]: # Create a 3x3 array of uniformly distributed # random values between 0 and 1
        np.random.random((3, 3))
Out[17]: array([[ 0.99844933,  0.52183819,  0.22421193],
                [ 0.08007488,  0.45429293,  0.20941444],
                [ 0.14360941,  0.96910973,  0.946117  ]])
In[18]: # Create a 3x3 array of normally distributed random values # with mean 0 and standard deviation 1 np.random.normal(0, 1, (3, 3))
Out[18]: array([[ 1.51772646,  0.39614948, -0.10634696],
                [ 0.25671348,  0.00732722,  0.37783601],
                [ 0.68446945,  0.15926039, -0.70744073]])
In[19]: # Create a 3x3 array of random integers in the interval [0, 10) np.random.randint(0, 10, (3, 3))
Out[19]: array([[2, 3, 4],
                [5, 7, 8],
[0, 5, 0]])
In[20]: # Create a 3x3 identity matrix np.eye(3)
Out[20]: array([[ 1.,  0.,  0.],
                [ 0.,  1.,  0.],
                [ 0.,  0.,  1.]])
In[21]: # Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that # memory location
np.empty(3)
Out[21]: array([ 1.,  1.,  1.])

Now, we have finished the intro or basics of numpy which is initialization and declaration. And we talked more on the difference between python lists and python arrays.

Next article we will proceed our discussion on Numpy and we will learn more on its data types and implement some operation with that knowledge.

NB: I started learning data analysis from https://jovian.com/learn/data-analysis-with-python-zero-to-pandas and I wrote this article using Python Data Science Handbook by Jake VanderPlas.

Are you interested in learning Data Science? Or would you be interested in any follow-up articles where we discuss more on this topic and other related topics by adding more depth discussions? Let me know in the comments!

Let's connect for follow-up and upcoming articles:

Twitter: https://twitter.com/MohammadAdde

Github: github.com/mohaomar495

Jovian Account: https://jovian.com/mohaomar495