Notes from my data science course presented by the University of San Diego

In [36]:
%matplotlib notebook

Data Science

Data Science is done in five steps:

  1. Acquire
  2. Prepare
  3. Analyse
  4. Report
  5. Act

Jupyter Notebooks

Jupyter notebooks can be shared and executed on the reciever's device, but they can also be shared as a non-executable version for example:

  • HTML
  • PDF
  • Python files

markdowns

With Markdowns you can display normal text these can have different fonts and colors.

1

2 3

These can also be changed with HTML, XML and LaTeX

$\mathsf{e}^{\pi\sqrt{-1}} + 1 = 0$

Unix

Jupyter can also take Unix commands but these have to start with a !

In [2]:
!ls -la
total 3464
drwxr-xr-x   8 timangevare  staff      256 16 mrt 17:33 .
drwx------+ 29 timangevare  staff      928 16 mrt 17:33 ..
-rw-r--r--@  1 timangevare  staff     6148 16 mrt 17:33 .DS_Store
drwxr-xr-x   2 timangevare  staff       64 16 mrt 17:32 .ipynb_checkpoints
-rw-r--r--   1 timangevare  staff  1751610 16 mrt 17:32 Notes Data Science.ipynb
drwx------@  9 timangevare  staff      288 11 mrt 20:20 Week-1-Intro-new
drwx------@  7 timangevare  staff      224 16 mrt 16:41 Week-3-Numpy
-rw-r--r--@  1 timangevare  staff     9031 16 mrt 17:21 transcript.txt
In [3]:
filename = "./transcript.txt"
!head -n 3 $filename
- After this brief overview of Jupyter Notebooks,
we will now start with the notebooks further.
Before we show you the great Python library is,
In [4]:
!tail -n 5 $filename
all of that plotted nicely on a graph.
As I mentioned before we will learn more
about matplotlib in week five.
Next we will discuss the numpy library in Python.
That should be fun.
In [5]:
!grep -i "Unix" $filename
I would like to quickly overview how to use UNIX commands
to use UNIX shell commands,
and more interactive way to use UNIX commands.
like we execute them on a UNIX shell.
called UNIX in the same directory with this notebook,
you have a cold folder called .unix
So here we say filename ./unix/shakespeare.txt
To display this variable we can use the UNIX way
and use the echo command in UNIX
then we don't need the UNIX, like dollar sign.
for that UNIX variable resolution.
As you would remember in our UNIX exercises

this will find the amount of times Unix is mentiones in the text

In [48]:
!grep -i "Unix" $filename | wc
      12      93     516

Numpy

Numpy has arrays which are way faster and offer much more functionality. Numpy arrays also have more dimensions/ranks a one rank array is a vector and one with more ranks is called a matrix.

In [22]:
import numpy as np
array = np.array([[1,2,3],[4,5,6],[7,8,9]])
arrays = np.array(array[:2])
print(arrays)
[[1 2 3]
 [4 5 6]]

Creating Arrays

In [13]:
arr1 = np.zeros((6,5)) #could also be done with ones
print(arr1)
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
In [16]:
arr2 = np.random.random((6,5))
print(arr2)
[[0.22681179 0.80256064 0.65232021 0.43919747 0.62605754]
 [0.07238571 0.59569228 0.42069122 0.48777407 0.27276291]
 [0.32724961 0.29193102 0.68179495 0.31590184 0.74428156]
 [0.87179496 0.16747188 0.58682239 0.05352499 0.52805762]
 [0.14175259 0.86513941 0.0270391  0.99381094 0.55864138]
 [0.10105638 0.41065515 0.23668832 0.70418584 0.78887039]]

accessing arrays

In [18]:
new_row = arr2[2,:]
new_col = arr2[:,3]

print(new_row)
print(new_col)
[0.32724961 0.29193102 0.68179495 0.31590184 0.74428156]
[0.43919747 0.48777407 0.31590184 0.05352499 0.99381094 0.70418584]

filtering

In [27]:
randlist = np.random.random(20)
filter = randlist > 0.5
print(filter)
[False  True False False  True False  True False False False False False
  True False False False False  True False  True]
In [28]:
print(randlist[randlist > 0.5])
[0.79634184 0.53676016 0.85493646 0.52113814 0.73513268 0.66523979]

types

In [29]:
print(randlist.mean())
0.3824599816961265
In [32]:
print(randlist.max()) #can also do min
0.8549364551664078
In [34]:
print(np.median(randlist))
0.364468951452435

sorting

In [43]:
list2 = np.array([3,4,2,5,9,99,12,52])
list2.sort()
print(list2)
[ 2  3  4  5  9 12 52 99]

broadcasting

In [11]:
list = np.array(np.arange(16))
list = list.reshape(4,4)
print(list.shape)
sum = list + 2
print(sum)
(4, 4)
[[ 2  3  4  5]
 [ 6  7  8  9]
 [10 11 12 13]
 [14 15 16 17]]

satellite imaging

In [25]:
import matplotlib.pyplot as plt
In [25]:
photo_data = plt.imread('./Week-3-Numpy/wifire/sd-3layers.jpg')
In [26]:
plt.figure(figsize=(15,15))
plt.imshow(photo_data)
Out[26]:
<matplotlib.image.AxesImage at 0x1c1b2da250>