Python Collections

Collection Types:

1) List is a collection which is ordered and changeable. Allows duplicate members

2) Tuple is a collection which is ordered and unchangeable. Allows duplicate members

3) Set is a collection which is unordered and unindexed. No duplicate members

4) Dictionary is a collection which is unordered, changeable and indexed. No duplicate members.

1) List

list = ["apple", "grapes", "banana"]
print(list)

['apple', 'grapes', 'banana']

print(list[1]) #access the list items by referring to the index number

grapes

print(list[-1]) #Negative indexing means beginning from the end, -1 refers to the last item

banana

list2 = ["apple", "banana", "cherry", "orange", "kiwi", "melon", "mango"] 
print(list2[:4]) #By leaving out the start value, the range will start at the first item

['apple', 'banana', 'cherry', 'orange']

print(list2[2:])

['cherry', 'orange', 'kiwi', 'melon', 'mango']

print(list2[-4:-1]) #range

['orange', 'kiwi', 'melon']

list3 = ["A", "B", "C"]
list3[1] = "D" #change the value of a specific item, by refering to the index number
print(list3)

['A', 'D', 'C']

# For loop

list4 = ["apple", "banana", "cherry"]
for x in list4:
  print(x)

apple
banana
cherry

#To determine if a specified item is present in a list

if "apple" in list4:
  print("Yes")

Yes

#To determine how many items a list has

print(len(list4))

3

List Methods:

append() : Adds an element at the end of the list
clear() : Removes all the elements from the list
copy() : Returns a copy of the list
count() : Returns the number of elements with the specified value
extend() : Add the elements of a list (or any iterable), to the end of the current list
index() : Returns the index of the first element with the specified value
insert() : Adds an element at the specified position
pop() : Removes the element at the specified position
remove() : Removes the item with the specified value
reverse() : Reverses the order of the list
sort() : Sorts the list

#append() method to append an item

list4.append("orange")
print(list4)

['apple', 'banana', 'cherry', 'orange']

#Insert an item as the second position

list4.insert(1, "orange")
print(list4)

['apple', 'orange', 'banana', 'cherry', 'orange']

#The remove() method removes the specified item

list4.remove("banana")
print(list4)

['apple', 'orange', 'cherry', 'orange']

#pop() method removes the specified index
#and the last item if index is not specified

list4.pop()
print(list4)

['apple', 'orange', 'cherry']

#The del keyword removes the specified index

del list4[0]
print(list4)

['orange', 'cherry']

#The del keyword can also delete the list completely
del list4
ptint(list4)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-27-fcb70c6c4d66> in <module>
      1 #The del keyword can also delete the list completely
----> 2 del list4
      3 ptint(list4)

NameError: name 'list4' is not defined

#The clear() method empties the list

list5 = ["apple", "banana", "cherry"]
list5.clear()
print(list5)

[]

#the copy() method to make a copy of a list 
list5 = ["apple", "banana", "cherry"]
mylist = list5.copy()
print(mylist)

['apple', 'banana', 'cherry']

#Join Two Lists

list1 = ["a", "b" , "c"]
list2 = [1, 2, 3]

list3 = list1 + list2
print(list3)

['a', 'b', 'c', 1, 2, 3]

#Append list2 into list1

list1 = ["a", "b" , "c"]
list2 = [1, 2, 3]

for x in list2:
  list1.append(x)

print(list1)

['a', 'b', 'c', 1, 2, 3]

#the extend() method to add list2 at the end of list1

list1 = ["a", "b" , "c"]
list2 = [1, 2, 3]

list1.extend(list2)
print(list1)

['a', 'b', 'c', 1, 2, 3]

2) Tuple

A tuple is a collection which is ordered and unchangeable.

tuple1 = ("apple", "banana", "cherry")
print(tuple1)

('apple', 'banana', 'cherry')

#access tuple item

print(tuple1[1])

banana

#Negative indexing means beginning from the end, -1 refers to the last item

print(tuple1[-1])

cherry

#Range : Return the third, fourth, and fifth item

tuple2 = ("apple", "banana", "cherry", "orange", "kiwi", "melon", "mango")
print(tuple2[2:5])

('cherry', 'orange', 'kiwi')

#Specify negative indexes if you want to start the search from the end of the tuple

print(tuple2[-4:-1])

('orange', 'kiwi', 'melon')

#loop through the tuple items by using a for loop

tuple3 = ("apple", "banana", "cherry")
for x in tuple3:
  print(x)

apple
banana
cherry

#Check if Item Exists

if "apple" in tuple3:
  print("Yes")

Yes

#Print the number of items in the tuple

print(len(tuple3))

3

# join two or more tuples you can use the + operator

tuple1 = ("a", "b" , "c")
tuple2 = (1, 2, 3)

tuple3 = tuple1 + tuple2
print(tuple3)

('a', 'b', 'c', 1, 2, 3)

#Using the tuple() method to make a tuple

thistuple = tuple(("apple", "banana", "cherry")) # note the double round-brackets
print(thistuple)

('apple', 'banana', 'cherry')

3) Set

A set is a collection which is unordered and unindexed. Sets are written with curly brackets.

set1 = {"apple", "banana", "cherry"}
print(set1)

{'banana', 'apple', 'cherry'}

#Access items, Loop through the set, and print the values

for x in set1:
  print(x)

banana
apple
cherry

if "apple" in set1:
  print("Yes")

Yes

Set methods:

add() Adds an element to the set
clear() Removes all the elements from the set
copy() Returns a copy of the set
difference() Returns a set containing the difference between two or more sets
difference_update() Removes the items in this set that are also included in another, specified set
discard() Remove the specified item
intersection() Returns a set, that is the intersection of two other sets
intersection_update() Removes the items in this set that are not present in other, specified set(s)
isdisjoint() Returns whether two sets have a intersection or not
issubset() Returns whether another set contains this set or not
issuperset() Returns whether this set contains another set or not
pop() Removes an element from the set
remove() Removes the specified element
symmetric_difference() Returns a set with the symmetric differences of two sets
symmetric_difference_update() inserts the symmetric differences from this set and another
union() Return a set containing the union of sets
update() Update the set with the union of this set and others

# Adding new items 


set1.add("orange")
print(set1)

{'banana', 'apple', 'cherry', 'orange'}

#Add multiple items to a set, using the update() method

set1.update(["orange", "mango", "grapes"])

print(set1)

{'banana', 'cherry', 'orange', 'apple', 'grapes', 'mango'}

# length of the set


print(len(set1))

6

# remove item 

set1.remove("banana")

print(set1)

{'cherry', 'orange', 'apple', 'grapes', 'mango'}

#Remove the last item by using the pop() method

set2 = {"apple", "banana", "cherry"}

x = set2.pop()

print(x)
print(set2)

banana
{'apple', 'cherry'}

#clear() method empties the set


thisset = {"apple", "banana", "cherry"}

thisset.clear()

print(thisset)

set()

#del keyword will delete the set completely

thisset = {"apple", "banana", "cherry"}

del thisset

print(thisset)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-67-b8e1fa6a22f4> in <module>
      5 del thisset
      6 
----> 7 print(thisset)

NameError: name 'thisset' is not defined

#use the union() method that returns a new set containing all items from both sets, 
#or the update() method that inserts all the items from one set into another

set1 = {"a", "b" , "c"}
set2 = {1, 2, 3}

set3 = set1.union(set2)
print(set3)

{'b', 1, 2, 3, 'a', 'c'}

#update() method inserts the items in set2 into set1

set1 = {"a", "b" , "c"}
set2 = {1, 2, 3}

set1.update(set2)
print(set1)

{'b', 1, 2, 3, 'a', 'c'}

4) Dictionary

A dictionary is a collection which is unordered, changeable and indexed.

dict = {
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}
print(dict)

{'brand': 'Ford', 'model': 'Mustang', 'year': 1964}

#access the items of a dictionary by referring to its key name, inside square brackets

dict["model"]

'Mustang'

Dict methods

clear() Removes all the elements from the dictionary
copy() Returns a copy of the dictionary
fromkeys() Returns a dictionary with the specified keys and value
get() Returns the value of the specified key
items() Returns a list containing a tuple for each key value pair
keys() Returns a list containing the dictionary's keys
pop() Removes the element with the specified key
popitem() Removes the last inserted key-value pair
setdefault() Returns the value of the specified key. If the key does not exist: insert the key, with the specified value
update() Updates the dictionary with the specified key-value pairs
values() Returns a list of all the values in the dictionary

#use get() to get the same result

dict.get("model")

'Mustang'

#change the value of a specific item by referring to its key name

dict1 = {
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}
dict1["year"] = 2018

print(dict1)

{'brand': 'Ford', 'model': 'Mustang', 'year': 2018}

#loop through a dictionary by using a for loop

for x in dict1:
  print(x)

brand
model
year

#Print all values in the dictionary, one by one

for x in dict1:
  print(dict1[x])

Ford
Mustang
2018

#use the values() method to return values of a dictionary

for x in dict1.values():
  print(x)

Ford
Mustang
2018

#Loop through both keys and values, by using the items() method

for x, y in dict1.items():
  print(x, y)

brand Ford
model Mustang
year 2018

#Check if an item present in the dictionary

if "model" in dict1:
  print("Yes")

Yes

print(len(dict1))

3

#adding items

thisdict = {
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}
thisdict["color"] = "red"
print(thisdict)

{'brand': 'Ford', 'model': 'Mustang', 'year': 1964, 'color': 'red'}

#pop() method removes the item with the specified key name

thisdict = {
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}
thisdict.pop("model")
print(thisdict)

{'brand': 'Ford', 'year': 1964}

# popitem() method removes the last inserted item

thisdict = {
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}
thisdict.popitem()
print(thisdict)

{'brand': 'Ford', 'model': 'Mustang'}

#del keyword removes the item with the specified key name

thisdict = {
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}
del thisdict["model"]
print(thisdict)

{'brand': 'Ford', 'year': 1964}

#dictionary can also contain many dictionaries, this is called nested dictionaries

myfamily = {
  "child1" : {
    "name" : "Emil",
    "year" : 2004
  },
  "child2" : {
    "name" : "Tobias",
    "year" : 2007
  },
  "child3" : {
    "name" : "Linus",
    "year" : 2011
  }
}

#Create three dictionaries, then create one dictionary that will contain the other three dictionaries

child1 = {
  "name" : "Emil",
  "year" : 2004
}
child2 = {
  "name" : "Tobias",
  "year" : 2007
}
child3 = {
  "name" : "Linus",
  "year" : 2011
}

myfamily = {
  "child1" : child1,
  "child2" : child2,
  "child3" : child3
}

Python Conditions

If statement

a = 100
b = 200
if b > a:
  print("b is greater than a")

b is greater than a

#simplyfied:

a = 100
b = 200
if a < b: print("a is greater than b")

a is greater than b

a = 20
b = 20
if b > a:
  print("b is greater than a")
elif a == b:
  print("a and b are equal")

a and b are equal

a = 200
b = 100
if b > a:
  print("b is greater than a")
elif a == b:
  print("a and b are equal")
else:
  print("a is greater than b")

a is greater than b

# simplyfied:

a = 100
b = 300
print("A") if a > b else print("B")

B

AND and OR Statement

a = 200
b = 33
c = 500
if a > b and c > a:
  print("Both conditions are True")

Both conditions are True

a = 200
b = 33
c = 500
if a > b or a > c:
  print("At least one of the conditions is True")

At least one of the conditions is True

Nested If

x = 41

if x > 10: print("Above ten,") if x > 20: print("and also above 20!") else: print("but not above 20.")

Pass

#if statements cannot be empty, but if you for some reason have an if statement 
#with no content, put in the pass statement to avoid getting an error

a = 33
b = 200

if b > a:
  pass

The while Loop

i = 1
while i < 6:
  print(i)
  i += 1

Break Statement

i = 1
while i < 6:
  print(i)
  if i == 3:
    break
  i += 1

1
2
3

# with Continue

i = 0
while i < 6:
  i += 1
  if i == 3:
    continue
  print(i)

### Else statement

i = 1
while i < 6:
  print(i)
  i += 1
else:
  print("i is no longer less than 6")

1
2
3
4
5
i is no longer less than 6

For Loops

# For loop for List

fruits = ["apple", "banana", "cherry"]
for x in fruits:
  print(x)

apple
banana
cherry

# strings

for x in "banana":
  print(x)

b
a
n
a
n
a

#break statement

fruits = ["apple", "banana", "cherry"]
for x in fruits:
  print(x)
  if x == "banana":
    break

apple
banana

fruits = ["apple", "banana", "cherry"]
for x in fruits:
  if x == "banana":
    break
  print(x)

apple

#continue

fruits = ["apple", "banana", "cherry"]
for x in fruits:
  if x == "banana":
    continue
  print(x)

apple
cherry

# Range

for x in range(6):
  print(x)

for x in range(2, 6):
  print(x)

for x in range(2, 30, 3):
  print(x)

for x in range(6):
  print(x)
else:
  print("Finally finished!")

0
1
2
3
4
5
Finally finished!

adj = ["red", "big", "tasty"]
fruits = ["apple", "banana", "cherry"]

for x in adj:
  for y in fruits:
    print(x, y)

red apple
red banana
red cherry
big apple
big banana
big cherry
tasty apple
tasty banana
tasty cherry

for x in [0, 1, 2]:
  pass

Creating a Function

def my_function():
  print("Hello")


my_function()

Hello

def my_function(*kids):
  print("The youngest child is " + kids[2])

my_function("Emil", "Tobias", "Linus")

The youngest child is Linus

def my_function(child3, child2, child1):
  print("The youngest child is " + child3)

my_function(child1 = "Emil", child2 = "Tobias", child3 = "Linus")

The youngest child is Linus

#Passing a List as an Argument

def my_function(food):
  for x in food:
    print(x)

fruits = ["apple", "banana", "cherry"]

my_function(fruits)

apple
banana
cherry

#return value

def my_function(x):
  return 5 * x

print(my_function(3))

15

#Recursion Example

def tri_recursion(k):
  if(k > 0):
    result = k + tri_recursion(k - 1)
    print(result)
  else:
    result = 0
  return result

print("\n\nRecursion Example Results")
tri_recursion(6)


Recursion Example Results
1
3
6
10
15
21

21

lambda function

x = lambda a, b, c : a + b + c
print(x(5, 6, 2))

13

def myfunc(n):
  return lambda a : a * n

def myfunc(n):
  return lambda a : a * n

mydoubler = myfunc(2)

print(mydoubler(11))

22

def myfunc(n):
  return lambda a : a * n

mydoubler = myfunc(2)
mytripler = myfunc(3)

print(mydoubler(11))
print(mytripler(11))

22
33

Open a File on the Server

Reading files

#f = open("demofile.txt", "r")
#print(f.read())

#f = open("D:\\myfiles\welcome.txt", "r")
#print(f.read())

#Read one line of the file

#f = open("demofile.txt", "r")
#print(f.readline())

#Loop through the file line by line

#f = open("demofile.txt", "r")
#for x in f:
#  print(x)

#Close the file when you are finish with it

#f = open("demofile.txt", "r")
#print(f.readline())
#f.close()

Writing files:

#Open the file "demofile2.txt" and append content to the file

#f = open("demofile2.txt", "a")
#f.write("Now the file has more content!")
#f.close()

#open and read the file after the appending:
#f = open("demofile2.txt", "r")
#print(f.read())

#Open the file "demofile3.txt" and overwrite the content

#f = open("demofile3.txt", "w")
#f.write("Woops! I have deleted the content!")
#f.close()

#open and read the file after the appending:
#f = open("demofile3.txt", "r")
#print(f.read())

#Create a file called "myfile.txt"

#f = open("myfile.txt", "x")

#Remove the file "demofile.txt"

#import os
#os.remove("demofile.txt")

#Check if file exists, then delete it:

#import os
#if os.path.exists("demofile.txt"):
#  os.remove("demofile.txt")
#else:
#  print("The file does not exist")

#Try to open and write to a file that is not writable:

#try:
#  f = open("demofile.txt")
#  f.write("Lorum Ipsum")
#except:
#  print("Something went wrong when writing to the file")
#finally:
#  f.close()

#Raise an error and stop the program if x is lower than 0:

#x = -1

#if x < 0:
#  raise Exception("Sorry, no numbers below zero")

#Raise a TypeError if x is not an integer:

#x = "hello"

#if not type(x) is int:
# raise TypeError("Only integers are allowed")

NumPy

import numpy as np

simple_list = [1,2,3]

np.array(simple_list)

array([1, 2, 3])

list_of_lists = [[1,2,3], [4,5,6], [7,8,9]]

np.array(list_of_lists)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

np.arange(0,10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.arange(0,21,5)

array([ 0,  5, 10, 15, 20])

np.zeros(50)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

np.ones((4,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

np.linspace(0,20,10)

array([ 0.        ,  2.22222222,  4.44444444,  6.66666667,  8.88888889,
       11.11111111, 13.33333333, 15.55555556, 17.77777778, 20.        ])

np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

np.random.rand(3,2)

array([[0.24202235, 0.57396416],
       [0.0400231 , 0.38224147],
       [0.30024483, 0.20187655]])

np.random.randint(5,20,10)

array([10, 14, 18, 11,  9, 15, 16, 19, 13,  9])

np.arange(30)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

np.random.randint(0,100,20)

array([81, 69, 90, 47,  3, 97, 31,  9, 58, 77, 92, 64, 73, 37, 65, 66,  9,
       21, 25, 73])

sample_array = np.arange(30)
sample_array.reshape(5,6)

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29]])

rand_array = np.random.randint(0,100,20)
rand_array.argmin()

12

sample_array.shape

(30,)

sample_array.reshape(1,30)

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

sample_array.reshape(30,1)

array([[ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12],
       [13],
       [14],
       [15],
       [16],
       [17],
       [18],
       [19],
       [20],
       [21],
       [22],
       [23],
       [24],
       [25],
       [26],
       [27],
       [28],
       [29]])

sample_array.dtype

dtype('int32')

a = np.random.randn(2,3)
a.T

array([[-1.866579  , -0.77167212],
       [-0.24050824, -1.86954729],
       [ 1.09606272,  0.5064306 ]])

sample_array = np.arange(10,21)

sample_array

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

sample_array[[2,5]]

array([12, 15])

sample_array[1:2] = 100

sample_array

array([ 10, 100,  12,  13,  14,  15,  16,  17,  18,  19,  20])

sample_array = np.arange(10,21)

sample_array[0:7]

array([10, 11, 12, 13, 14, 15, 16])

sample_array = np.arange(10,21)
                        
sample_array

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

subset_sample_array = sample_array[0:7]

subset_sample_array

array([10, 11, 12, 13, 14, 15, 16])

subset_sample_array[:]=1001

subset_sample_array

array([1001, 1001, 1001, 1001, 1001, 1001, 1001])

sample_array

array([1001, 1001, 1001, 1001, 1001, 1001, 1001,   17,   18,   19,   20])

copy_sample_array = sample_array.copy()

copy_sample_array

array([1001, 1001, 1001, 1001, 1001, 1001, 1001,   17,   18,   19,   20])

copy_sample_array[:]=10
copy_sample_array

array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10])

sample_array

array([1001, 1001, 1001, 1001, 1001, 1001, 1001,   17,   18,   19,   20])

sample_matrix = np.array(([50,20,1,23], [24,23,21,32], [76,54,32,12], [98,6,4,3]))

sample_matrix

array([[50, 20,  1, 23],
       [24, 23, 21, 32],
       [76, 54, 32, 12],
       [98,  6,  4,  3]])

sample_matrix[0][3]

23

sample_matrix[0,3]

23

sample_matrix[3,:]

array([98,  6,  4,  3])

sample_matrix[3]

array([98,  6,  4,  3])

sample_matrix = np.array(([50,20,1,23,34], [24,23,21,32,34], [76,54,32,12,98], [98,6,4,3,67], [12,23,34,56,67]))

sample_matrix

array([[50, 20,  1, 23, 34],
       [24, 23, 21, 32, 34],
       [76, 54, 32, 12, 98],
       [98,  6,  4,  3, 67],
       [12, 23, 34, 56, 67]])

sample_matrix[:,[1,3]]

array([[20, 23],
       [23, 32],
       [54, 12],
       [ 6,  3],
       [23, 56]])

sample_matrix[:,(3,1)]

array([[23, 20],
       [32, 23],
       [12, 54],
       [ 3,  6],
       [56, 23]])

sample_array=np.arange(1,31)

sample_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30])

bool = sample_array < 10

sample_array[bool]

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

sample_array[sample_array <10]

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

a=11

sample_array[sample_array < a]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

sample_array + sample_array

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34,
       36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60])

sample_array / sample_array

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

10/sample_array

array([10.        ,  5.        ,  3.33333333,  2.5       ,  2.        ,
        1.66666667,  1.42857143,  1.25      ,  1.11111111,  1.        ,
        0.90909091,  0.83333333,  0.76923077,  0.71428571,  0.66666667,
        0.625     ,  0.58823529,  0.55555556,  0.52631579,  0.5       ,
        0.47619048,  0.45454545,  0.43478261,  0.41666667,  0.4       ,
        0.38461538,  0.37037037,  0.35714286,  0.34482759,  0.33333333])

sample_array + 1

array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
       19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])

np.var(sample_array)

74.91666666666667

array = np.random.randn(6,6)

array

array([[-1.2513939 ,  0.63036933,  1.34352857,  0.69169362,  0.01026876,
         0.59189891],
       [-1.17904234, -0.12504466,  0.31374784,  0.09035803, -0.61388114,
         1.1150514 ],
       [ 1.06328715,  0.46405969,  0.00697848, -2.29704625,  0.96100601,
         0.83872649],
       [ 0.3548689 , -0.20216495, -1.17393345,  0.04961487, -0.67034172,
         0.55421924],
       [-2.2873708 , -1.24865618, -0.5852612 , -1.14245419,  0.63155215,
        -0.86846749],
       [-0.19474274,  0.26641693, -1.72485259,  1.13081737, -0.48967084,
        -0.56814362]])

np.std(array)

0.9421603403314502

np.mean(array)

-0.15316678701199804

sports = np.array(['golf', 'cric', 'fball', 'cric', 'Cric', 'fooseball'])

np.unique(sports)

array(['Cric', 'cric', 'fball', 'fooseball', 'golf'], dtype='<U9')

sample_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30])

simple_array = np.arange(0,20)

simple_array

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

np.save('sample_array', sample_array)

np.savez('2_arrays.npz', a=sample_array, b=simple_array)

np.load('sample_array.npy')

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30])

archive = np.load('2_arrays.npz')

archive['b']

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

np.savetxt('text_file.txt', sample_array,delimiter=',')

np.loadtxt('text_file.txt', delimiter=',')

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13.,
       14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26.,
       27., 28., 29., 30.])

data = {'prodID': ['101', '102', '103', '104', '104'],

                    'prodname': ['X', 'Y', 'Z', 'X', 'W'],

                     'profit': ['2738', '2727', '3497', '7347', '3743']}

Pandas

import pandas as pd

score = [10, 15, 20, 25]

pd.Series(data=score, index = ['a','b','c','d'])

a    10
b    15
c    20
d    25
dtype: int64

demo_matrix = np.array(([13,35,74,48], [23,37,37,38], [73,39,93,39]))

demo_matrix

array([[13, 35, 74, 48],
       [23, 37, 37, 38],
       [73, 39, 93, 39]])

demo_matrix[2,3]

39

np.arange(0,22,6)

array([ 0,  6, 12, 18])

demo_array=np.arange(0,10)

demo_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

demo_array <3

array([ True,  True,  True, False, False, False, False, False, False,
       False])

demo_array[demo_array <6]

array([0, 1, 2, 3, 4, 5])

np.max(demo_array)

9

s1 = pd.Series(['a', 'b'])

s2 = pd.Series(['c', 'd'])

pd.concat([s1+s2])

0    ac
1    bd
dtype: object

Creating a Series using Pandas

You could convert a list,numpy array, or dictionary to a Series in the following manner

labels = ['w','x','y','z']
list = [10,20,30,40]
array = np.array([10,20,30,40])
dict = {'w':10,'x':20,'y':30,'z':40}

pd.Series(data=list)

0    10
1    20
2    30
3    40
dtype: int64

pd.Series(data=list,index=labels)

w    10
x    20
y    30
z    40
dtype: int64

pd.Series(list,labels)

w    10
x    20
y    30
z    40
dtype: int64

pd.Series(array)

0    10
1    20
2    30
3    40
dtype: int32

pd.Series(array,labels)

w    10
x    20
y    30
z    40
dtype: int32

pd.Series(dict)

w    10
x    20
y    30
z    40
dtype: int64

Using an Index

We shall now see how to index in a Series using the following examples of 2 series

sports1 = pd.Series([1,2,3,4],index = ['Cricket', 'Football','Basketball', 'Golf'])

sports1

Cricket       1
Football      2
Basketball    3
Golf          4
dtype: int64

sports2 = pd.Series([1,2,5,4],index = ['Cricket', 'Football','Baseball', 'Golf'])

sports2

Cricket     1
Football    2
Baseball    5
Golf        4
dtype: int64

sports1 + sports2

Baseball      NaN
Basketball    NaN
Cricket       2.0
Football      4.0
Golf          8.0
dtype: float64

DataFrames

DataFrames concept in python is similar to that of R programming language. DataFrame is a collection of Series combined together to share the same index positions.

from numpy.random import randn
np.random.seed(1)

dataframe = pd.DataFrame(randn(10,5),index='A B C D E F G H I J'.split(),columns='Score1 Score2 Score3 Score4 Score5'.split())

dataframe

Selection and Indexing

Ways in which we can grab data from a DataFrame

dataframe['Score3']

A   -0.528172
B   -0.761207
C   -0.322417
D   -0.877858
E    0.901591
F   -0.935769
G   -0.687173
H    0.234416
I   -0.747158
J    2.100255
Name: Score3, dtype: float64

# Pass a list of column names in any order necessary
dataframe[['Score2','Score1']]

#DataFrame Columns are nothing but a Series each
type(dataframe['Score1'])

pandas.core.series.Series

Adding a new column to the DataFrame

dataframe['Score6'] = dataframe['Score1'] + dataframe['Score2']

dataframe

Removing Columns from DataFrame

# Use axis=0 for dropping rows and axis=1 for dropping columns
    
dataframe.drop('Score6',axis=1)

# column is not dropped unless inplace input is TRUE
dataframe

dataframe.drop('Score6',axis=1,inplace=True)
dataframe

Dropping rows using axis=0

# Row will also be dropped only if inplace=TRUE is given as input

dataframe.drop('A',axis=0)

Selecting Rows

dataframe.loc['F']

Score1   -0.683728
Score2   -0.122890
Score3   -0.935769
Score4   -0.267888
Score5    0.530355
Name: F, dtype: float64

select based off of index position instead of label - use iloc instead of loc function

dataframe.iloc[2]

Score1    1.462108
Score2   -2.060141
Score3   -0.322417
Score4   -0.384054
Score5    1.133769
Name: C, dtype: float64

Selecting subset of rows and columns using loc function

dataframe.loc['A','Score1']

1.6243453636632417

dataframe.loc[['A','B'],['Score1','Score2']]

Conditional Selection

Similar to NumPy, we can make conditional selections using Brackets

dataframe>0.5

dataframe[dataframe>0.5]

dataframe[dataframe['Score1']>0.5]

dataframe[dataframe['Score1']>0.5]['Score2']

A   -0.611756
C   -2.060141
Name: Score2, dtype: float64

dataframe[dataframe['Score1']>0.5][['Score2','Score3']]

Some more features of indexing includes

resetting the index
setting a different value
index hierarchy

# Reset to default index value instead of A to J
dataframe.reset_index()

# Setting new index value
newindex = 'IND JP CAN GE IT PL FY IU RT IP'.split()

dataframe['Countries'] = newindex
dataframe

dataframe.set_index('Countries')

# Once again, ensure that you input inplace=TRUE
dataframe

dataframe.set_index('Countries',inplace=True)

dataframe

Missing Data

Methods to deal with missing data in Pandas

dataframe = pd.DataFrame({'Cricket':[1,2,np.nan,4,6,7,2,np.nan],
                  'Baseball':[5,np.nan,np.nan,5,7,2,4,5],
                  'Tennis':[1,2,3,4,5,6,7,8]})

dataframe

dataframe.dropna()

# Use axis=1 for dropping columns with nan values

dataframe.dropna(axis=1)

dataframe.dropna(thresh=2)

dataframe.fillna(value=0)

dataframe['Baseball'].fillna(value=dataframe['Baseball'].mean())

0    5.000000
1    4.666667
2    4.666667
3    5.000000
4    7.000000
5    2.000000
6    4.000000
7    5.000000
Name: Baseball, dtype: float64

Groupby

The groupby method is used to group rows together and perform aggregate functions

dat = {'CustID':['1001','1001','1002','1002','1003','1003'],
       'CustName':['UIPat','DatRob','Goog','Chrysler','Ford','GM'],
       'Profitinlakhs':[2005,3245,1245,8765,5463,3547]}

dataframe = pd.DataFrame(dat)

dataframe

We can now use the .groupby() method to group rows together based on a column name. For example let's group based on CustID. This will create a DataFrameGroupBy object:

dataframe.groupby('CustID') #This object can be saved as a variable

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001FDFCEFE9C8>

CustID_grouped = dataframe.groupby("CustID") #Now we can aggregate using the variable

CustID_grouped.mean()

groupby function for each aggregation

dataframe.groupby('CustID').mean()

CustID_grouped.std()

CustID_grouped.min()

CustID_grouped.max()

CustID_grouped.count()

CustID_grouped.describe()

CustID_grouped.describe().transpose()

CustID_grouped.describe().transpose()['1001']

Profitinlakhs  count       2.000000
               mean     2625.000000
               std       876.812409
               min      2005.000000
               25%      2315.000000
               50%      2625.000000
               75%      2935.000000
               max      3245.000000
Name: 1001, dtype: float64

combining DataFrames together:

Merging
Joining
Concatenating

dafa1 = pd.DataFrame({'CustID': ['101', '102', '103', '104'],
                        'Sales': [13456, 45321, 54385, 53212],
                        'Priority': ['CAT0', 'CAT1', 'CAT2', 'CAT3'],
                        'Prime': ['yes', 'no', 'no', 'yes']},
                        index=[0, 1, 2, 3])

dafa2 = pd.DataFrame({'CustID': ['101', '103', '104', '105'],
                        'Sales': [13456, 54385, 53212, 4534],
                        'Payback': ['CAT4', 'CAT5', 'CAT6', 'CAT7'],
                        'Imp': ['yes', 'no', 'no', 'no']},
                         index=[4, 5, 6, 7]) 

dafa3 = pd.DataFrame({'CustID': ['101', '104', '105', '106'],
                        'Sales': [13456, 53212, 4534, 3241],
                        'Pol': ['CAT8', 'CAT9', 'CAT10', 'CAT11'],
                        'Level': ['yes', 'no', 'no', 'yes']},
                        index=[8, 9, 10, 11])

Concatenation

Concatenation joins DataFrames basically either by rows or colums(axis=0 or 1).

We also need to ensure dimension sizes of dataframes are the same

pd.concat([dafa1,dafa2])

D:\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  """Entry point for launching an IPython kernel.

pd.concat([dafa1,dafa2,dafa3],axis=1)

Merging

Just like SQL tables, merge function in python allows us to merge dataframes

pd.merge(dafa1,dafa2,how='outer',on='CustID')

Operations

Let us discuss some useful Operations using Pandas

dataframe = pd.DataFrame({'custID':[1,2,3,4],'SaleType':['big','small','medium','big'],'SalesCode':['121','131','141','151']})
dataframe.head()

Info on Unique Values

dataframe['SaleType'].unique()

array(['big', 'small', 'medium'], dtype=object)

dataframe['SaleType'].nunique()

3

dataframe['SaleType'].value_counts()

big       2
small     1
medium    1
Name: SaleType, dtype: int64

Selecting Data

#Select from DataFrame using criteria from multiple columns
newdataframe = dataframe[(dataframe['custID']!=3) & (dataframe['SaleType']=='big')]
newdataframe

Applying Functions

def profit(a):
    return a*4

dataframe['custID'].apply(profit)

0     4
1     8
2    12
3    16
Name: custID, dtype: int64

dataframe['SaleType'].apply(len)

0    3
1    5
2    6
3    3
Name: SaleType, dtype: int64

dataframe['custID'].sum()

10

Permanently Removing a Column

dataframe

del dataframe['custID']
dataframe

Get column and index names

dataframe.columns

Index(['SaleType', 'SalesCode'], dtype='object')

dataframe.index

RangeIndex(start=0, stop=4, step=1)

Sorting and Ordering a DataFrame

dataframe.sort_values(by='SaleType') #inplace=False by default

Find Null Values or Check for Null Values

dataframe.isnull()

# Drop rows with NaN Values
dataframe.dropna()

Filling in NaN values with something else

dataframe = pd.DataFrame({'Sale1':[5,np.nan,10,np.nan],
                   'Sale2':[np.nan,121,np.nan,141],
                   'Sale3':['XUI','VYU','NMA','IUY']})
dataframe.head()

dataframe.fillna('Not nan')

Data Input and Output

Reading DataFrames from external sources using pd.read functions

CSV Input

# dataframe = pd.read_csv('filename.csv')

CSV output

#If index=FALSE then csv does not store index values

# dataframe.to_csv('filename.csv',index=False)

Excel Input

# pd.read_excel('filename.xlsx',sheet_name='Data1')

Excel Output

# dataframe.to_excel('Consumer2.xlsx',sheet_name='Sheet1')

	Score1	Score2	Score3	Score4	Score5
A	1.624345	-0.611756	-0.528172	-1.072969	0.865408
B	-2.301539	1.744812	-0.761207	0.319039	-0.249370
C	1.462108	-2.060141	-0.322417	-0.384054	1.133769
D	-1.099891	-0.172428	-0.877858	0.042214	0.582815
E	-1.100619	1.144724	0.901591	0.502494	0.900856
F	-0.683728	-0.122890	-0.935769	-0.267888	0.530355
G	-0.691661	-0.396754	-0.687173	-0.845206	-0.671246
H	-0.012665	-1.117310	0.234416	1.659802	0.742044
I	-0.191836	-0.887629	-0.747158	1.692455	0.050808
J	-0.636996	0.190915	2.100255	0.120159	0.617203

	Score2	Score1
A	-0.611756	1.624345
B	1.744812	-2.301539
C	-2.060141	1.462108
D	-0.172428	-1.099891
E	1.144724	-1.100619
F	-0.122890	-0.683728
G	-0.396754	-0.691661
H	-1.117310	-0.012665
I	-0.887629	-0.191836
J	0.190915	-0.636996

	Score1	Score2	Score3	Score4	Score5	Score6
A	1.624345	-0.611756	-0.528172	-1.072969	0.865408	1.012589
B	-2.301539	1.744812	-0.761207	0.319039	-0.249370	-0.556727
C	1.462108	-2.060141	-0.322417	-0.384054	1.133769	-0.598033
D	-1.099891	-0.172428	-0.877858	0.042214	0.582815	-1.272319
E	-1.100619	1.144724	0.901591	0.502494	0.900856	0.044105
F	-0.683728	-0.122890	-0.935769	-0.267888	0.530355	-0.806618
G	-0.691661	-0.396754	-0.687173	-0.845206	-0.671246	-1.088414
H	-0.012665	-1.117310	0.234416	1.659802	0.742044	-1.129975
I	-0.191836	-0.887629	-0.747158	1.692455	0.050808	-1.079465
J	-0.636996	0.190915	2.100255	0.120159	0.617203	-0.446080

	Score1	Score2	Score3	Score4	Score5
A	1.624345	-0.611756	-0.528172	-1.072969	0.865408
B	-2.301539	1.744812	-0.761207	0.319039	-0.249370
C	1.462108	-2.060141	-0.322417	-0.384054	1.133769
D	-1.099891	-0.172428	-0.877858	0.042214	0.582815
E	-1.100619	1.144724	0.901591	0.502494	0.900856
F	-0.683728	-0.122890	-0.935769	-0.267888	0.530355
G	-0.691661	-0.396754	-0.687173	-0.845206	-0.671246
H	-0.012665	-1.117310	0.234416	1.659802	0.742044
I	-0.191836	-0.887629	-0.747158	1.692455	0.050808
J	-0.636996	0.190915	2.100255	0.120159	0.617203

	Score1	Score2	Score3	Score4	Score5	Score6
A	1.624345	-0.611756	-0.528172	-1.072969	0.865408	1.012589
B	-2.301539	1.744812	-0.761207	0.319039	-0.249370	-0.556727
C	1.462108	-2.060141	-0.322417	-0.384054	1.133769	-0.598033
D	-1.099891	-0.172428	-0.877858	0.042214	0.582815	-1.272319
E	-1.100619	1.144724	0.901591	0.502494	0.900856	0.044105
F	-0.683728	-0.122890	-0.935769	-0.267888	0.530355	-0.806618
G	-0.691661	-0.396754	-0.687173	-0.845206	-0.671246	-1.088414
H	-0.012665	-1.117310	0.234416	1.659802	0.742044	-1.129975
I	-0.191836	-0.887629	-0.747158	1.692455	0.050808	-1.079465
J	-0.636996	0.190915	2.100255	0.120159	0.617203	-0.446080

	Score1	Score2	Score3	Score4	Score5
A	True	False	False	False	True
B	False	True	False	False	False
C	True	False	False	False	True
D	False	False	False	False	True
E	False	True	True	True	True
F	False	False	False	False	True
G	False	False	False	False	False
H	False	False	False	True	True
I	False	False	False	True	False
J	False	False	True	False	True

	Score1	Score2	Score3	Score4	Score5	Countries
Countries
IND	1.624345	-0.611756	-0.528172	-1.072969	0.865408	IND
JP	-2.301539	1.744812	-0.761207	0.319039	-0.249370	JP
CAN	1.462108	-2.060141	-0.322417	-0.384054	1.133769	CAN
GE	-1.099891	-0.172428	-0.877858	0.042214	0.582815	GE
IT	-1.100619	1.144724	0.901591	0.502494	0.900856	IT
PL	-0.683728	-0.122890	-0.935769	-0.267888	0.530355	PL
FY	-0.691661	-0.396754	-0.687173	-0.845206	-0.671246	FY
IU	-0.012665	-1.117310	0.234416	1.659802	0.742044	IU
RT	-0.191836	-0.887629	-0.747158	1.692455	0.050808	RT
IP	-0.636996	0.190915	2.100255	0.120159	0.617203	IP

	Cricket	Baseball	Tennis
0	1.0	5.0	1
1	2.0	NaN	2
2	NaN	NaN	3
3	4.0	5.0	4
4	6.0	7.0	5
5	7.0	2.0	6
6	2.0	4.0	7
7	NaN	5.0	8

	Cricket	Baseball	Tennis
0	1.0	5.0	1
1	2.0	0.0	2
2	0.0	0.0	3
3	4.0	5.0	4
4	6.0	7.0	5
5	7.0	2.0	6
6	2.0	4.0	7
7	0.0	5.0	8

	CustID	CustName	Profitinlakhs
0	1001	UIPat	2005
1	1001	DatRob	3245
2	1002	Goog	1245
3	1002	Chrysler	8765
4	1003	Ford	5463
5	1003	GM	3547

	Profitinlakhs
	count	mean	std	min	25%	50%	75%	max
CustID
1001	2.0	2625.0	876.812409	2005.0	2315.0	2625.0	2935.0	3245.0
1002	2.0	5005.0	5317.442995	1245.0	3125.0	5005.0	6885.0	8765.0
1003	2.0	4505.0	1354.816593	3547.0	4026.0	4505.0	4984.0	5463.0

	CustID	1001	1002	1003
Profitinlakhs	count	2.000000	2.000000	2.000000
	mean	2625.000000	5005.000000	4505.000000
	std	876.812409	5317.442995	1354.816593
	min	2005.000000	1245.000000	3547.000000
	25%	2315.000000	3125.000000	4026.000000
	50%	2625.000000	5005.000000	4505.000000
	75%	2935.000000	6885.000000	4984.000000
	max	3245.000000	8765.000000	5463.000000

	CustID	Imp	Payback	Prime	Priority	Sales
0	101	NaN	NaN	yes	CAT0	13456
1	102	NaN	NaN	no	CAT1	45321
2	103	NaN	NaN	no	CAT2	54385
3	104	NaN	NaN	yes	CAT3	53212
4	101	yes	CAT4	NaN	NaN	13456
5	103	no	CAT5	NaN	NaN	54385
6	104	no	CAT6	NaN	NaN	53212
7	105	no	CAT7	NaN	NaN	4534

	CustID	Sales	Priority	Prime	CustID	Sales	Payback	Imp	CustID	Sales	Pol	Level
0	101	13456.0	CAT0	yes	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	102	45321.0	CAT1	no	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	103	54385.0	CAT2	no	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	104	53212.0	CAT3	yes	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	101	13456.0	CAT4	yes	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	NaN	103	54385.0	CAT5	no	NaN	NaN	NaN	NaN
6	NaN	NaN	NaN	NaN	104	53212.0	CAT6	no	NaN	NaN	NaN	NaN
7	NaN	NaN	NaN	NaN	105	4534.0	CAT7	no	NaN	NaN	NaN	NaN
8	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	101	13456.0	CAT8	yes
9	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	104	53212.0	CAT9	no
10	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	105	4534.0	CAT10	no
11	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	106	3241.0	CAT11	yes

	custID	SaleType	SalesCode
0	1	big	121
1	2	small	131
2	3	medium	141
3	4	big	151

	custID	SaleType	SalesCode
0	1	big	121
3	4	big	151

	custID	SaleType	SalesCode
0	1	big	121
1	2	small	131
2	3	medium	141
3	4	big	151

	SaleType	SalesCode
0	big	121
1	small	131
2	medium	141
3	big	151

	Cricket	Baseball	Tennis
0	1.0	5.0	1
1	2.0	NaN	2
2	NaN	NaN	3
3	4.0	5.0	4
4	6.0	7.0	5
5	7.0	2.0	6
6	2.0	4.0	7
7	NaN	5.0	8

	Cricket	Baseball	Tennis
0	1.0	5.0	1
1	2.0	0.0	2
2	0.0	0.0	3
3	4.0	5.0	4
4	6.0	7.0	5
5	7.0	2.0	6
6	2.0	4.0	7
7	0.0	5.0	8

	Cricket	Baseball	Tennis
0	1.0	5.0	1
1	2.0	NaN	2
2	NaN	NaN	3
3	4.0	5.0	4
4	6.0	7.0	5
5	7.0	2.0	6
6	2.0	4.0	7
7	NaN	5.0	8

	Cricket	Baseball	Tennis
0	1.0	5.0	1
1	2.0	0.0	2
2	0.0	0.0	3
3	4.0	5.0	4
4	6.0	7.0	5
5	7.0	2.0	6
6	2.0	4.0	7
7	0.0	5.0	8