greends-ipython

Introduction to Python 2024/2025

Masters in Data Science applied to agricultural and food sciences, environment, and forestry engineering.

Instructor: Manuel Campagnolo (mlc@isa.ulisboa.pt)

Teaching assistant: Dominic Welsh (djwelsh@edu.ulisboa.pt)

# Online resources for the course

Required: CS50’s Introduction to Programming with Python: lectures (videos and notes), problems sets, shorts; The platform allows you to test your code at the CS50 codespace for the proposed problems (you need to have your own GitHub account to access the codespace).
Python Programming course at PP.fi: same features as CS50 but to test your solutions to problems you are required to pass previous tests
Learn Python: lectures (videos) and interactive examples and exercises
Introduction to Python (VScode): interactive lectures and exercises
Basic concepts and features of the Python language and system: The Python Tutorial at python.org.
Fenix webpage for the course (https://fenix.isa.ulisboa.pt/courses/intpy-283463546571610)
Moodle (https://elearning.ulisboa.pt/course/view.php?id=9100)

#### Comparison of CS50P and PP.fi

CS50P	Contents	PP.fi	Contents
Lecture 0	Creating Code with Python; Functions; Bugs; Strings and Parameters; Formatting Strings; More on Strings; Integers or int; Readability Wins; Float Basics; More on Floats; Def; Returning Values	Part 1	Intro; I/O; More about variables; Arithmetic operations; Conditional statements
Lecture 1	Conditionals, if Statements, Control FlowModulo; Creating Our Own Parity Function; Pythonic; match	Part 2	Programming terminology; More conditionals; Combining conditions; Simple loops
Lecture 2	Loops; While Loops; For Loops; Improving with User Input; More About Lists; Length; Dictionaries, More on code modularity	Part 3	Loops with conditions; Working with strings; More loops; Defining functions
		Part 4	The Visual Studio Code editor, Python interpreter and built-in debugging tool; More functions; Lists; Definite iteration; Print statement formatting; More strings and lists
		Part 5	More lists; References; Dictionary; Tuple
Lecture 3	Exceptions, Runtime Errors, try, else, Creating a Function to Get an Integer, pass	Part 6	Reading files; Writing files; Handling errors; Local and global variables
Lecture 4	Libraries, Random, Statistics, Command-Line Arguments, slice, Packages, APIs, Making Your Own Libraries	Part 7	Modules; Randomness; Times and dates; Data processing; Creating your own modules; More Python features
Lecture 5	Unit Tests; assert; pytest; Testing Strings; Organizing Tests into Folders
Lecture 6	File I/O; open; with; CSV; Binary Files and PIL
Lecture 7	Regular Expressions; Case Sensitivity; Cleaning Up User Input; Extracting User Input
Lecture 8	Object-Oriented Programming; Classes; raise; Decorators; Class Methods; Static Methods; Inheritance; Inheritance and Exceptions; Operator Overloading	Part 8	Objects and methods; Classes and objects; Defining classes; Defining methods; More examples of classes
		Part 9	Objects and references; Objects as attributes; Encapsulation; Scope of methods; Class attributes; More examples with classes
		Part 10	Class hierarchies; Access modifiers; Object oriented programming techniques; Developing a larger application
Lecture 9	set; Global Variables; Constants; Type Hints; Docstrings; argparse; Unpacking; args and kwargs; map; List Comprehensions; filter; Dictionary Comprehensions; enumerate; Generators and Iterators	Part 11	List comprehensions; More comprehensions; Recursion; More recursion examples
		Part 12	Functions as arguments; Generators; Functional programming; Regular expressions

# Class 1 (September 13, 2024): data types, variables, functions

Install Python and VS code: https://code.visualstudio.com/docs/python/python-tutorial. Alternatively, you can code in the CS50 cloud environment (VScode). Two steps: 1. log in into your github account; 2. access your code space at https://cs50.dev/. This environment allows you to test automatically your scripts for the CS50 problem sets.
Some useful keyworks for the command line interface (CLI) in terminal:
- code filename.py to create a new file
- ls to list files in folder
- cp filename newfilename to copy a file, e.g. cp ..\hello.py farewell.py (.. represents parent folder)
- mv filename newfilename to rename or move file, e.g. my farewell.py goodbye.py or mv farewell.py .. (move one folder up)
- rm filename to delete (remove) file
- mkdir foldername to create new folder
- cd foldername change directory, e.g. cd ..
- rmdir foldername to delete folder
- clear to clear terminal window
The REPL (interactive Read -Eval-Print-Loop) environment: see https://realpython.com/interacting-with-python/
All values in Python have a type. The five basic types are: integer, float, string, Boolean, and None.
- strings (str), variables, print (a function), parameters (e.g. end=), input, comments, formatted strings (f"..."), .strip(), .title (methods)
- integers (int), operations for integers, casting (e.g. strto int)
- floating point values (float), round, format floats (e.g. f"{z:.2f})
- True, False, and, or, not
Functions, def, return
Suggested problems: CS50 Problem set 0

# Class 2 (September 20, 2024): conditionals, lists, dictionaries

Conditionals:

if, elif, else:

  if score >= 70:
  print("Grade: C to A")
  elif score >= 60:
  print("Grade: D")
  else:
  print("Grade: F")

match:

 match species:
 case 'versicolor':
     label=0
 case 'virginica'
     label=1
 case _:
     label=2

Pythonic coding: def main(), define other functions, call main(). The code must be modular.
While loops, for loops, break, break and return
Data type list []: methods append, extend

Data type dictionary {}, items(), keys .key() and values .values()

knights = {'gallahad': 'the pure', 'robin': 'the brave'}
for k, v in knights.items():
    print(k, v)
if 'gallahad' in knights:
    print('Go Gallahad')

Suggested problems: CS50 Problem set 1 and 2. See the assignment on Moodle: problems File extensions, Coke machine, Plates

# Class 3 (September 27, 2024): exercises, best practices

Exercises from CS50 Problem set 0, 1 and 2.

# Class 4 (October 4, 2024): handling exceptions

Handling exceptions in Python: raising and catching exceptions.

Example from (https://cs50.harvard.edu/python/2022/shorts/handling_exceptions/). Exercise: adapt the proposed code to be more modular, where the main function is something like the one below:

def main():
    spacecraft = input("Enter a spacecraft: ")
    au=get_au(spacecraft)
    m = convert(au)
    print(f"{m} m")

Exercises from CS50 Problem set 3.

For the fuel gauge problem (https://cs50.harvard.edu/python/2022/psets/3/fuel/), try to organize your code as follows. As suggested in hints, you should catch ValueError and ZeroDivisionError exceptions in your code. In the code below, the user is being asked for correct values for x,y until they satisfy the requirements: x,y must be inputted as a string x/y, x has to be less or equal to y, and y cannot be zero. The function get_string_of_integers_X_less_than_Y in the code below should take care of that.

def main():
    # asks user for input until the input is as expected
    x,y=get_string_of_integers_X_less_than_Y()
    # compute percentage from two integers
    p=compute_percentage(x,y)
    # print output 
    print_gauge(p)

A few examples of code that can be helpful to solve problems in problem set 3:

Example of basic use of try-except to catch a ValueError:

try:
    x = int(input("What's x?"))
except ValueError:
    print("x is not an integer")
else:
    print(f"x is {x}")

Function for requesting an integer from the user until no exceptions are caught:

def get_int():
    while True:
        try:
            x = int(input("What's x?"))
        except ValueError:
            print("x is not an integer")
        else:
            break
    return x

We may want to exit the execution of our script if some exception is caught. This can be done with sys.exit(), which can also be used to print a message.

import sys # import module
try:
    x = int(input("What's x?"))
except ValueError:
    sys.exit("x is not an integer")

Example of code that catches CRTL-C or CRTL-D:

while True:
    try:
        x=int(input())
    except ValueError:
        print('x is not integer')
    except KeyboardInterrupt: #CTRL-C
        print('\n KeyboardInterrupt')
        break
    except EOFError: # CTRL-D
        print('\n EOFError')
        break
    else:
        print(x)

For a list of Python Built-in Exceptions, you can explore (https://www.w3schools.com/python/python_ref_exceptions.asp)

# Class 5 (October 11, 2024): libraries, modules, APIs

Modules

You can store your own functions in modules (which are just python scripts) and import then into your main code. Let’s imagine you created a file named mymodule.py in a given folder. In your main script, you can import the file if the folder belongs to list of folders the Python interpreter will look for. You can check that by running the following lines of codes in the Python interpreter:

>>>import sys
>>>sys.path

If the folder where mymodule.py was created does not belong to that list, you can add it with sys.path.append which allows you to import your module. To that end, you can include the followings lines to your main script:

import sys
sys.path.append(r'path-to-folder') # folder where mymodule is
import mymodule

where path-to-folder is the path that you can easily copy in your IDE.

If your module includes a function named, say, get_integer, you can then use the function in your main script either by calling mymodule.get_integer() or you can instead load the function with from mymodule import get_integer and then just call it with get_integer() in the main script as in the following script.

import sys
sys.path.append(r'/workspaces/8834091/modules') # where file mymodule.py is
from mymodule import get_integer
def main():
    x=get_integer()
    print(x)
main()

Contents of mymodule.py:

import sys
def get_integer() -> int:
    while True:
        try:
            return(int(input('type a number:  ')))
        except ValueError:
            print('not an integer number: try again')
        except KeyboardInterrupt: #CTRL-C
            print('\n If you want to exit type CTRL-D')
        except EOFError: # CTRL-D
            sys.exit('\n exit as requested')

Often, you import a module that is available at (https://pypi.org/project/pip/). Say you want to load the module random which provides a series of functions for sampling, shuffling, and extracting random numbers from a variety of probability distributions. If the module is not already available, you can typically load it in your terminal with

$pip install random

and then import it on your main script with import random. If you want to know which is the folder where the module is located, you can get that information with random.__file__.

`sys.argv`

Previously, we used module sys, in particular functions sys.exit() and sys.path. Another useful function is sys.argv, that allows you to have access to what the user typed in at the command line $ as in

import sys
print(len(sys.argv)) # returns the number of words in the command line after $python
print(sys.argv[1]) # returns the 2nd word, i.e., the first word after $python myscript.py

For instance, the following script named sum.py prints the sum of two numbers that were specified in the command line with $python sum.py 1.2 4.3:

import sys
try:
    x,y = float(sys.argv[1]), float(sys.argv[2])
    print('the sum is',x+y)
except IndexError:
    print('missing argument')
except ValueError:
    print('The arguments are not numbers')

APIs

Application program interfaces allow you to communicate with a remote server. For instance, requests is a package that allows your program to behave as a web browser would. Consider the following script myrequest.py that allows you to explore the itunes database (https://performance-partners.apple.com/search-api):

import requests
import sys
try:
    response = requests.get("https://itunes.apple.com/search?entity=song&limit=1&term=" + sys.argv[1])
    print(response.json())
except IndexError:
    sys.exit('Missing argument')
except requests.RequestException:
   sys.exit('Request failed')

You can easily adapt that code to access a different database. For instance if you want to explore the GBIF database (https://data-blog.gbif.org/post/gbif-api-beginners-guide/), you can just replace the main line of code in myrequest.py with

response=requests.get('https://api.gbif.org/v1/species/match?name='+ sys.argv[1])

and execute it with, say, $python myrequest.py Tracheophyta in the terminal.

There are many ways of running an API in Python. The following example shows how you can access satellite imagery through the Google Earth Engine API and compute the mean land surface temperature at some location from the MODIS11 product. To be able to use the API, you need to have a Google account, and an earth engine project associated to it.

# pip install earthengine-api
import ee
# Trigger the authentication flow.
ee.Authenticate()
# Initialize the library.
ee.Initialize(project='project-name') # e.g. 'ee-my-mlc-math-isa-utl'
# Import the MODIS land surface temperature collection.
lst = ee.ImageCollection('MODIS/006/MOD11A1')
# Selection of appropriate bands and dates for LST.
lst = lst.select('LST_Day_1km', 'QC_Day').filterDate('2020-01-01', '2024-01-01')
# Define the urban location of interest as a point near Lyon, France.
u_lon = 4.8148
u_lat = 45.7758
u_poi = ee.Geometry.Point(u_lon, u_lat)
scale = 1000  # scale in meters
# Calculate and print the mean value of the LST collection at the point.
lst_urban_point = lst.mean().sample(u_poi, scale).first().get('LST_Day_1km').getInfo()
print('Average daytime LST at urban point:', round(lst_urban_point*0.02 -273.15, 2), '°C')

Problems

Solve problems from CS50P Problem_set_4. In particular, for problem Bitcoin price index organize your code so the main function is the following:

def main():
    x=read_command_line_input()
    price=get_bitcoin_price()
    print(f"${x*price:,.4f}")

# Class 6 (October 18, 2024): virtual environments; file I/O

Virtual environments in Python

A virtual environment (https://docs.python.org/3/library/venv.html) is:

Used to contain a specific Python interpreter and software libraries and binaries which are needed to support a project (library or application). These are by default isolated from software in other virtual environments and Python interpreters and libraries installed in the operating system. - Contained in a directory, conventionally named .venv or venv in the project directory, or under a container directory for lots of virtual environments. - Not checked into source control systems such as Git. - Considered as disposable – it should be simple to delete and recreate it from scratch. You don’t place any project code in the environment. - Not considered as movable or copyable – you just recreate the same environment in the target location.

In your system you have the base environment by default, and you can create one or more virtual environments. Below, we describe how to create a virtual environment and how to activate it, so you commands in terminal are interpreted within that environment. That allows you to encapsulate in each virtual environment you create a given Python version, and a set of Python packages with their given versions. Your data and script files remain on the usual working folders: they should not be moved to the folders where the virtual environment files are stored.

The following commands work in the CS50 codespace that runs Linux (check with $cat /etc/os-release in the terminal). Some need to be slightly adapted for Windows.

Firstly, let’s check what are the available packages and their versions in the base environment, and also let’s get extra information about the package requests (e.g. dependencies):

$ pip list 
$ pip show requests

Next, let’s create a virtual environment. One can first create (with mkdir) a folder called, say, my_venvs so all the virtual environments are created in that folder. This makes sense since virtual enrironment folders are created independently from the working folders that contain data and scripts. The virtual environment myvenv can then be created with:

my_venvs/ $ python3 -m venv myvenv # creates environment called myvenv with Python 3

In case one needs to delete the virtual environment, one just needs to delete the folder. This can be done with $ sudo rm -rf myvenv in the terminal (Linux). After the virtual environment has been created, one needs to activate it. In Linux, this is done by executing activate which lies in the bin folder of the virtual environment:

my_venvs/ $ source myvenv/bin/activate # note that activate needs to be sourced

As a result, the prompt shows (myvenv) my_venvs/ $ which indicates that myvenv is now activated. One can check the Python version with $python -V. To de-activate a virtual environment, the command is $ deactivate. With the environment activated, let’s try to install a few packages, specifying the versions. For instance, install the following packages.

(myvenv) my_venvs/ $ pip install random11==0.0.1
(myvenv) my_venvs/ $ pip install geopy==1.23.0
(myvenv) my_venvs/ $ pip install requests==2.25.0

Some of this packages depend on additional packages that are installed automatically. To list all instaled packages within the environment myvenv one can execute (myvenv) $ pip list as before. Compare the version of requests in myvenv with the version returned initially in the base environment: this one is 2.25.0 while the one in the base environment is more recent. One can also check where requests is installed in myvenv with the command (myvenv) $ pip show requests.

Check the system path (where Python will look for installed packages) by executing print(sys.path): one can do this from the terminal with the command

(myvenv) my_venvs/ $ python -c 'import sys; print(sys.path)'

Notice that the folder in myvenv where the virtual environment packages are installed is listed, but the path to where base packages are stored is not.

If one wishes to share a virtual environment, the way to do that is to share a file (typically, requirements.txt) that allows a collaborator to re-create the environment. requirements.txt stores the information about the installed packages in a file in case one intends to share the environment (e.g. in GitHub). Towards that end, one needs to create requirements.txt with the packages names and versions, that can be used to create a clone of the environment on another machine. This is done, still within myvenv (i.e. with myvenv activated) with the following command:

(myvenv) my_venvs/ $ pip freeze > requirements.txt

Note that the file requirements.txt is created in the folder that contains myvenv and not within myvenv itself: this makes sense, since one does not want to store scripts or data within myvenv but just packages and the Python version. Since requirements.txt is now available, one can create a copy of myvenv called, say, myvenv2. Firstly, one needs to de-activate myvenv. Then, the commands to be executed in the terminal are:

my_venvs/ $ python3 - m venv myvenv2 # create new virtual environment with the Python 3 interpreter called myvenv2
my_venvs/ $ source myvenv2/bin/activate # activate myvenv2
(myvenv2) my_venvs/ $ pip install -r requirements.txt # install packages and versions listed in requirements.txt

Exercise: go back to myvenv, add package (say, emoji==0.1.0), re-build requirements.txt, and create new environment myvenv3 and install the set of packages listed in the new requirements.txt.

File I/O

As discussed in (https://cs50.harvard.edu/python/2022/notes/6/) open is a functionality built into Python that allows you to open a file and utilize it in your program. The open function allows you to open a file such that you can read from it or write to it. The most basic way to use open allow us to enable file I/O with respect to a given file. In the example below, w is the argument value that indicates that the file is open in writing mode. The instruction file.write(...) will entirely rewrite the file, deleting the previous contents.

name='Bob'
file = open("names.txt", "w")
file.write(name)
file.close()

As an alternative, if the goal is to add new contents to the file, which is appended to the existent content, then w should be replaced by a (append). Each call to file.write(name) will then add the value of name to the end of file.

Instead of explicitly opening and closing a file, it’s simpler to use the so-called context manager in Python, using the keyword with, which automatically closes the file:

with open("names.txt", "w") as f:
  f.write(name)

If one wishes to read from a file, then the file has to be opened in reading mode as in the following example. The method readlines reads all lines of the file, and stores them in a list, where each element of the list is the contents of the corresponding line.

with open("names.txt", "r") as f:
  L=f.readlines(name)

However, it is possible to read one line at the time:

with open('myfile.txt','r') as f:
    N=0
    for line in f:
        N+=1
print('number of lines', N)

Aa an alternative, this can be done with method readline. This can be included in a loop to read the whole file. Notice that when the end of the file is reached, readline returns the empty string, and this can be easily tested with a condition.

Reading a file in Python gives the flexibility of visiting any position in the file. The initial position is 0 by default but can be instantiated with f.seek(n). Then, f.read(10) for instance reads n characters from that initial position. Method f.tell() returns the current position in the file.

A file can be of type text (human-readable) or binary. Binary files like images for instance are read with with open('myfile.txt','rb') as f.

Exercise: Consider the file downloaded from INE (the Portuguese Institute of Statistics) about causes of fires by geographical location rural_fires.csv. The source is INE: “Rural fires (No.) by Geographic localization (NUTS - 2013) and Cause of fire; Annual” for 2023. Write a script to read the file and exclude the lines which are not formated as a table (header lines). The formatted lines should be written into a new file, say (table_rural_fires.csv).

with open('rural_fires.csv','rb') as f:
    with open('table_rural_fires.csv',"w") as fw:
         for line in f:
              if line[0] in ['1','2','3']: # or smth like line.startswith('1'):
                 fw.write(line)

Since the file contains non ASCII characters, one might want to try to decode those characters correctly. Note that Python provides methods encode and decode as in the example below.

str_original = 'ção'
bytes_encoded = str_original.encode(encoding='utf-8')
print(type(bytes_encoded))
str_decoded = bytes_encoded.decode()
print(type(str_decoded))
print('Encoded bytes =', bytes_encoded)
print('Decoded String =', str_decoded)
print('str_original equals str_decoded =', str_original == str_decoded)

# Class 7 (October 25, 2024): tabular data; pandas

Create a Pandas DataFrame from scratch

Pandas dataframes have an intrinsic tabular structure represented by rows and columns where each row and column has a unique label (name) and position number inside the dataframe. The row labels, called dataframe index, can be integer numbers or string values, the column labels, called column names, are usually strings. Use the following script to create a dataframe with random values. Notice the terminology for rows (index) and columns (columns).

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6, 4), index=list('abcdef'), columns=list('ABCD'))
print(df)

Exercices:

print the column names of df with .columns.
Create a Series that corresponds to column A with ['A']
Create a new dataframe that corresponds to columns A and C with [['A','C']].

Notice that .columns returns a pd.Index object. This is to provide extra functionality and performance compared to lists. To extract a list of names, one can use .columns.tolist() or .columns.values.

Reading a csv file, selecting columns by name, selecting rows by condition

Consider the dataset that described 517 fires from the Montesinho natural park in Portugal. For each incident weekday, month, coordinates, and the burnt area are recorded, as well as several meteorological data such as rain, temperature, humidity, and wind (https://www.kaggle.com/datasets/vikasukani/forest-firearea-datasets). For reference, a copy of the file is available forestfires.csv. The variables are:

X - x-axis spatial coordinate within the Montesinho park map: 1 to 9
Y - y-axis spatial coordinate within the Montesinho park map: 2 to 9
month - month of the year: “Jan” to “dec”
day - day of the week: “mon” to “sun”
FFMC - FFMC index from the FWI system: 18.7 to 96.20
DMC - DMC index from the FWI system: 1.1 to 291.3
DC - DC index from the FWI system: 7.9 to 860.6
ISI - ISI index from the FWI system: 0.0 to 56.10
temp - the temperature in Celsius degrees: 2.2 to 33.30
RH - relative humidity in %: 15.0 to 100
wind - wind speed in km/h: 0.40 to 9.40
rain - outside rain in mm/m2 : 0.0 to 6.4
area - the burned area of the forest (in ha): 0.00 to 1090.84

The goal is to download the file and use package Pandas to explore it and solve the following tasks.

Read the file with pd.read_csv into a new object fires, and show the first 10 rows with fires.head(10).
Create list of column names and determine column data types with attribute .dtypes.
Print a summary of the dataframe with .info().
Create a Series with the temperature values for all 517 fires.
Create a DataFrame just with columns month and day.
Select fires for which the temperature is higher than 25 Celsius, and between 20 and 25 Celsius; note that each condition needs to be surrounded by (...) and can be connected with & or | or negated with ~.
Select fires that occured on weekends; use the conditional function .isin()
Check if there are no Null values in the dataframe with .notna(). You can sum along columns with .sum().

Select rows and columns with loc (label-based indexing) and iloc (positional indexing)

These are operators to select rows and columns from a dataframe. loc selects rows and columns using the row and column names. iloc uses the positions in the table. Notice that new values can be assigned to selections defined with locand iloc.

Interpret the result of fires.iloc[0:3,2:4]
Use loc and is.in() to select fires from August and September and just FWI based variables values for those fires.
Use iloc to select the first 20 fires and just the FWI based variables values

Combining positional and label-based indexing

There are several possibilities to combine positional and label-based indexing:

(with iloc) Using df.columns.get_loc() which converts the name of one column into its position. Then iloc can be used to perform the selection. For multiple columns determined by a list of column names, one can use instead df.columns.get_indexer(). Example: Use iloc to select the first 20 fires and just the FWI based variables values, using the names rather than the positions of those variables. Solution: FWI_positions=fires.columns.get_indexer(['FFMC','DMC','DC','ISI']) and ` fires.iloc[0:20,FWI_positions]`
(with loc) Using df.index[] to extract the index names. Then, loc can be used to perform the selection. Solution: fires.loc[fires.index[0:20], ['FFMC', 'DMC', 'DC', 'ISI']].

Exporting to file

Exporting is done with operations named .to_... as listed in (https://pandas.pydata.org/docs/user_guide/io.html)

Export your file as an Excel spreadsheet with .to_excel("filename.xlsx", sheetname="fires", index=False)
Read an Excel spreadsheet with: pd.read_excel("filename.xlsx", sheetname="fires", index=False)

Use generative AI to help with the following tasks

Create a dataframe months_df from a dictionary: for instance create a dictionary where keys are jan, feb, mar, for all 12 months, and the values are January, February, March and so on.

month_data = {
    'Month': [
        'January', 'February', 'March', 'April', 'May', 'June', 
        'July', 'August', 'September', 'October', 'November', 'December'
    ],
    'mth': [
        'jan', 'feb', 'mar', 'apr', 'may', 'jun', 
        'jul', 'aug', 'sep', 'oct', 'nov', 'dec'
    ]
}
months_df = pd.DataFrame(month_data)

Merge with new dataframe to get a new variable that contains the full name of the month. See (https://pandas.pydata.org/docs/user_guide/merging.html)

merged_df = pd.merge(fires, months_df, left_on='month', right_on='mth', how='left')
merged_df.drop(columns='mth', inplace=True)

# Class 8 (November 8, 2024): pandas (cont'd), jupyter notebooks

Create a jupyter notebook for this class. If you’re using your CS50 codespace, create a new file in the terminal with $code mynotebook.ipynb and follow the suggestions for jupyter notebooks in your codespace session.

There are many available cheatsheets for Pandas that can help visualizing Pandas’ functionalities. Since there are many possibilities, a single page cheatsheet is either too limited or too cryptic. This 12-page cheatsheet is pretty self-contained and includes several examples.

Use generative AI to help with the following tasks

Reduce the fires dataframe with method .groupby to get just one row per month, and average temperature, average RH, and number of fires per month. The goal is to create a dataframe named firesbymonth with columns avg_temp, avg_RH and fire_count. See (https://pandas.pydata.org/docs/user_guide/groupby.html)
What is the effect of adding the method .reset_index() to the previous command?
Sort the dataframe firesbymonth, such that the 12 rows are ordered by month correctly: jan, feb, mar, and so on.
Create a new column called conditions in firesbymonth of type string that indicates if a month is dry&hot, dry&cold, wet&hot or wet&cold. Use the mean values of avg_temp and avg_RH to establish the appropriate thresholds. Use method .apply and define the function to apply with lambda.
Re-organize the information in fires into a two-way table that shows the total area of fires per day of the week and per month, where NaN are replaced by 0. Towards that end, explore the .pivot_table method.

# Class 9 (November 15, 2024): OOP, classes, methods

Suppose that one wants write a script in python using classes to monitor plants at a nursery. Initially plants grow from seeds in trays and one wants to keep track of the trays and number of plants per tray. All plants in a given tray are from the same species. Then, at some point, some plants are transferred from trays to individual pots (one plant per pot). At the end, pots are sold. One wants to track the number of plants of each species that are in the nursery.

For this type of problem, one wants to mimic entities of the real world (plants, trays, pots, and the nursery) as objects in Python code. Object-oriented programming is an intuitive form of doing so. A class in Python is an object constructor, or a blueprint for creating objects.

The simplest example of class, with very little functionality, is a class to store constant values, which are not supposed to change. When one calls the class Constants defined below, an instance of the class with the two properties MAX_PLANTS_PER_TRAY and SALE_PRICE is created.

class Constants:
   MAX_PLANTS_PER_TRAY=50
   SALE_PRICE=10

print(Constants.SALE_PRICE)

However, in general we intent to call the class to create one instance (one object) of the class and set the properties of that object. To indicate the values of the instance properties we use the __init__ method:

class Plant:
    def __init__(self, species):
        self.species = species

my_plant=Plant("Rose") # create instance where property `species` has value `Rose`
print(my_plant.species)

Alternatively, a class can be created with the @dataclass decorator, see (https://docs.python.org/3/library/dataclasses.html). In this case, the __init__ method is set automatically.

from dataclasses import dataclass
@dataclass
class Plant:
    species: str

A class can have methods, which are functions defined for objects of the class. In the example below, Tray is a class with properties species and number_of_plants, and methods remove_plants and is_empty. The first has one argument which is the number of plants to remove from the tray; it returns a list of objects of the class Plant which correspond to the plants that were removed from the tray. The method is_empty doesn’t have an argument and returns True or False.

from dataclasses import dataclass

@dataclass
class Plant:
    species: str

@dataclass
class Tray:
    species: str
    number_of_plants: int
    def remove_plants(self, number): # self refers to the object of the class
        number=min(number,self.number_of_plants) #cannot remove more than available
        self.number_of_plants -= number
        return [Plant(self.species) for _ in range(number)] # returns list of instances of Plant
    def is_empty(self): # returns True of False
        return self.number_of_plants == 0

tray=Tray('Lily', 28)
plants=tray.remove_plants(10)
if tray.is_empty():
    print('The tray is empty')
else:
    print('There are still', tray.number_of_plants, tray.species, 'plants in the tray')
first_plant=plants[0]
print('The plant removed is', first_plant.species)

The code for the full problem that envolves plants of several species, trays, pots and sales can be organized in the following manner: - Plant class: Simple class to represent a plant with a species. - Pot class: Holds one plant each. - Tray class: Holds plants of a single species and can remove plants. - Nursery class: Manages trays, pots, and keeps track of plant counts by species. It has methods like add_tray, transfer_to_pots, and sell_pot to handle operations for tracking and updating counts.

Use generative AI to help with the following tasks

Create a script for the problem using the standard way of initializing classes with method __init__. Start with a simplified version of the problem where there are only trays and plants of distinct species in the nursery, which can be represented with 3 classes: Plant, Tray and Nursery. Trays can be created with a given number of plants of the same species, and plants can be removed from trays. The goal in this simplified version is to create the inventory that keeps track of the number of plants of each species that are in trays.

One possible solution for this simplified problem that was generated by Chat GPT when asked not to use @dataclass is nursery_v1.py. Note that this code lacks the __str__ or __repr__ methods and therefore print(nursery.trays) returns a list of objects with their memory address.

Add a __repr__ method similar to the one below to class Tray to redefine the output of print(nursery.trays) and make it more informative.

def __repr__(self):
    return f"Tray(species={self.species}, count={self.count})"

Add to the previous script a class that represents pots and adapt your script accordingly. When plants are removed from trays, they are always placed in a pot (one plant per pot). The goal is that the inventory tracks the plants and the species in both trays and pots (instead of just in trays as in nursery_v1.py).
Finally, consider that pots can be sold and therefore removed from the inventory.
Verify if your script removes trays that are empty from the inventory, and update it if it is not the case.

# Class 10 (November 22, 2024): Basic concepts of OOP

alt text

The four main concepts of Object-Oriented Programming (OOP) are Encapsulation, Abstraction, Inheritance, and Polymorphism. These concepts work together to create modular, scalable, and maintainable code in object-oriented programming.

This is a central topic in computer science, and therefore you can find all kind of resources about it. Among them, you can find simple descriptions of those concepts, with examples, at the following links:

(https://www.programiz.com/python-programming/object-oriented-programming)
(https://www.freecodecamp.org/news/object-oriented-programming-in-python/)
(https://www.w3schools.com/python/python_inheritance.asp), (https://www.w3schools.com/python/python_polymorphism.asp)

Building on the plant nursery example of last class, the following scripts illustrate the implementation of those concepts:

Encapsulation: OOP_encapsulation.py
Inheritance: OOP_inheritance.py
Polymorphism: OOP_polymorphism.py
Abstraction: OOP_abstraction.py

The next assignment will be the Cookie jar problem described at (https://cs50.harvard.edu/python/2022/psets/8/jar/). You will need to create a script for the problem and test it with check50 cs50/problems/2022/python/jar.

# Class 11 (November 29, 2024): Unit tests

This topic corresponds to Section 5 of the CS50 course: you can find the necessary resources on that link. In particular, see the short https://cs50.harvard.edu/python/2022/shorts/pytest/.

The idea is to create functions in Python (the names of those functions start with test_) that are used to test existing functions or classes in the script. To execute the test functions we call pytest in the terminal https://docs.pytest.org/ instead of python:

$ pytest - v # -v is optional for a more verbose output

If no arguments are given, pytest will execute all functions which name starts with test_ or end with _test in scripts in the current directory and all its subdirectories. However, $pytest my_file.py will only execute the tests within that file. Moreover, $pytest my_directory will only execute the tests defined in files located in that directory. There are further options to select the tests to be executed with pytest.

Simple example of a class and tests for that class

Consider you have two python modules: one with the definition of a class and the other that implement tests over that class.

# farm_carbon_footprint.py

import math

class Farm:
    def __init__(self, name, area_hectares):
        """Initialize the farm with a name and area in hectares."""
        self.name = name
        self.area_hectares = area_hectares
        self.activities = []

    def add_activity(self, activity, emissions_per_unit, units):
        """Add an activity with emissions in kg CO2e per unit and units."""
        self.activities.append((activity, emissions_per_unit, units))

    def total_emissions(self):
        """Calculate total carbon emissions from all activities."""
        return sum(emissions_per_unit * units for _, emissions_per_unit, units in self.activities)

    def emissions_per_hectare(self):
        """Calculate carbon emissions per hectare."""
        if self.area_hectares == 0:
            raise ValueError("Farm area cannot be zero.")
        return self.total_emissions() / self.area_hectares

    def radius_circle_with_farm_area(self):
        """ Calculate the radius (in meters) of a circle that has the same area as the farm"""
        return(math.sqrt(self.area_hectares/3.1459)*100)

and

# test_farm_carbon_footprint.py

import pytest
from farm_carbon_footprint import Farm

def test_add_activity():
    farm = Farm("Green Pastures", 10)
    farm.add_activity("Tractor Usage", 50, 5)  # 50 kg CO2e per hour, 5 hours
    farm.add_activity("Fertilizer Use", 10, 20)  # 10 kg CO2e per kg, 20 kg
    assert len(farm.activities) == 2

def test_total_emissions():
    farm = Farm("Green Pastures", 10)
    farm.add_activity("Tractor Usage", 50, 5)  # 50 kg CO2e per hour, 5 hours
    farm.add_activity("Fertilizer Use", 10, 20)  # 10 kg CO2e per kg, 20 kg
    assert farm.total_emissions() == 450  # 250 + 200

def test_emissions_per_hectare():
    farm = Farm("Green Pastures", 10)
    farm.add_activity("Tractor Usage", 50, 5)  # 50 kg CO2e per hour, 5 hours
    farm.add_activity("Fertilizer Use", 10, 20)  # 10 kg CO2e per kg, 20 kg
    assert farm.emissions_per_hectare() == 45  # 450 total / 10 hectares

def test_emissions_per_hectare_zero_area():
    farm = Farm("Tiny Farm", 0)
    farm.add_activity("Tractor Usage", 50, 2)  # 50 kg CO2e per hour, 2 hours
    with pytest.raises(ValueError, match="Farm area cannot be zero."): # optional: matches Value Error message in emissions_per_hectare()
        farm.emissions_per_hectare()

def test_radius_of_circle_with_farm_area():
    farm = Farm("Circle Farm", 1)
    assert farm.radius_circle_with_farm_area() == pytest.approx(56.38, abs=0.1)
    farm = Farm("Circle Farm", 10)
    assert farm.radius_circle_with_farm_area() == pytest.approx(178.3, abs=0.01)

Adapt the Farm class definition and test_farm_carbon_footprint.py in order to:

Add a method .number_of_activities() to class Farm that returns the number of activities. Check the correctness of that method with a new test in test_farm_carbon_footprint.py.
Adapt the Farmclass so ValueError should be raised if the property area_hectares is negative when you try to create an instance of Farm. Check with a new test in test_farm_carbon_footprint.py that the behavior of the class is as expected when area_hectares is negative.

# Class 12 (December 6, 2024): Lists and dictionaries: packing, args and kwargs, comprehension

1. The packing/unpacking operators * and **

The packing/unpacking operators allows us to deal with structures of variable length. The example below illustrates packing several numbers into a list.

x=[1,2,3,4,5,6,7,8,9]
a,*b,c=x # b is the list [2,3,4,5,6,7,8]
print(a,b,c)

The same operator can be used to unpack:

list1=[1,2,3]
list2=[6,7,8]
new_list=[*list1,4,5,*list2] # values are unpacked
print(new_list)

The * and ** operator are mostly used as arguments of functions that can accept a a variable number of arguments (like print): the operator * allows to pack all positional arguments into a tuple and the operator ** allows to pack all named arguments into a dictionary. In the example below, the variable kwargs refers to keyword arguments (i.e named arguments) . Note that one can have a combination of regular arguments, regular named arguments, *args, and **kwargs as arguments of a function, as long as keyword arguments follow positional arguments.

def pack(*args, **kwargs):
    return args,kwargs

x,y=pack(1,2,10, num_years=10,rate=0.03)

print('Positional arguments are packed into tuple',x)
print('Named arguments are packed into dictionary',y)

This can be used for instance to perform computations over a variable length sequence at in the following example.

# Compute accumulated interest on a sequence of borrowed amounts
def main(*args, **kwargs):
    '''
    args is a tuple of amounts borrowed
    kwargs is a dictionary with keys num_years and rate
    '''
    S=add(args)
    # Call function debt with **kwargs or kwargs
    D=compute_debt(S,**kwargs) # D expects a number and two named arguments with names num_years and rate
    # same as:
    D=compute_debt(S,kwargs['num_years'],kwargs['rate'])
    # print results
    print('Borrowed:',S)
    print('Debt:',round(D,3))

def add(values):
    s=0
    for x in values:
        s+=x
    return s

def compute_debt(s,num_years,rate):
    for i in range(num_years):
        s+=s*rate
    return s

if __name__=='__main__':
    main(1,2,10,5,4,num_years=10, rate=0.05)

Exercise i) Summing Arguments with `*args`

Write a function sum_all that takes any number of positional arguments and returns their sum.

def sum_all(*args):
    pass  # Your code here

# Example usage:
print(sum_all(1, 2, 3))       # Output: 6
print(sum_all(10, 20, 30, 5))  # Output: 65

Exercise ii) Concatenate Strings with `*args`

Create a function concat_strings that takes any number of string arguments using *args and concatenates them into a single string.

def concat_strings(*args):
    pass  # Your code here

# Example usage:
print(concat_strings("Hello", " ", "world", "!"))  # Output: "Hello world!"
print(concat_strings("Python", " is", " fun!"))    # Output: "Python is fun!"

Exercise iii) Handling Default Keyword Arguments with `**kwargs`

Write a function greet that accepts a keyword argument name (default value: "Guest") and an optional keyword argument greeting (default value: "Hello"). Return the formatted greeting message.

def greet(**kwargs):
    pass  # Your code here

# Example usage:
print(greet(name="Alice", greeting="Hi"))  # Output: "Hi Alice"
print(greet(name="Bob"))                   # Output: "Hello Bob"
print(greet())                             # Output: "Hello Guest"

Exercise iv) Combine `*args` and `**kwargs`

Write a function describe_person that takes positional arguments (*args) for hobbies and keyword arguments (**kwargs) for personal details (e.g., name, age). Return a formatted string describing the person.

def describe_person(*args, **kwargs):
    pass  # Your code here

# Example usage:
print(describe_person("reading", "traveling", name="Alice", age=30))
# Output: "Alice (30 years old) enjoys reading, traveling."

Exercise v) Filter Keyword Arguments with `**kwargs`

Create a function filter_kwargs that takes any number of keyword arguments and returns a new dictionary containing only those with values greater than 10.

def filter_kwargs(**kwargs):
    pass  # Your code here

# Example usage:
print(filter_kwargs(a=5, b=15, c=20, d=3))  # Output: {'b': 15, 'c': 20}

2. List and dictionary comprehension, map and filter

Suppose one wants to create a list with all the cubes of even numbers up to N. The following scripts show how this can be done with different operators that replace the traditional loop structure: list comprehension, filter, map and lambda

Operator map applies a given function to each element of a list. Likewise, filter applies a boolean function to filter elements of a list. Both function can be executed in parallel over the elements of the list since each output is independent of the outputs for the remainder elements of the list.

With list comprehension:

def cube(x):
  return x*x*x
L=[cube(x) for x in range(N) if x%2==0]

With filter to select even numbers and mapto compute cubes:

def even(x):
  return x%2==0
numbers=list(range(N))
even_numbers=list(filter(even, numbers))
cubes=list(map(cube,even_numbers))

Also with filter and map but defining implicitly the cube and even functions with lambda instead of def:

numbers=list(range(N))
even_numbers=list(filter(lambda x: x%2==0, numbers))
cubes=list(map(lambda x: x*x*x,even_numbers))

The most compact way of solving the problem involves lambda and list comprehension. In the example below, one needs to indicate that the $\lambda$ function has to be applied to the variable x in the list comprehension, using (lambda x: x*x*x)(x). Otherwise, the output list would be a list of lambda functions.

cubes=[(lambda x: x*x*x)(x) for x in range(N) if x%2==0]

Exercise i) Convert a For Loop to List Comprehension

Convert the following for loop into a list comprehension:

result = []
for x in range(10):
    result.append(x**2)
# output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Exercise ii) Filter Numbers with List Comprehension

Rewrite this code using a list comprehension:

result = []
for x in range(20):
    if x % 2 == 0:
        result.append(x)
# output: [0, 4, 16, 36, 64]

Exercise iii) Dictionary Comprehension

Convert the following code to a dictionary comprehension:

squares = {}
for x in range(5):
    squares[x] = x**2
# output: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

Exercise iv) Nested Loops with List Comprehension

Rewrite the nested loop as a list comprehension:

pairs = []
for x in range(3):
    for y in range(2):
        pairs.append((x, y))
# output: [(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)]

Exercise v) Conditional Dictionary Comprehension

Transform the following code into a dictionary comprehension with a condition:

filtered_squares = {}
for x in range(10):
    if x % 2 == 0:
        filtered_squares[x] = x**2
# output: {0: 0, 2: 4, 4: 16, 6: 36, 8: 64}

Exercise vi) Conditional Transformation in List Comprehension

Convert the following loop into a list comprehension that includes a conditional transformation:

result = []
for x in range(15):
    if x % 3 == 0:
        result.append(x**2)
    else:
        result.append(x)
# output: [0, 1, 2, 9, 4, 5, 36, 7, 8, 81, 10, 11, 144, 13, 14]

Exercise vii) Dictionary Comprehension with String Keys

Transform the following loop into a dictionary comprehension, using strings as keys:

word_lengths = {}
words = ["apple", "banana", "cherry", "date"]
for word in words:
    word_lengths[word] = len(word)
# output: {'apple': 5, 'banana': 6, 'cherry': 6, 'date': 4}

Exercise viii) Flatten a Nested List with List Comprehension

Rewrite this code using a single list comprehension to flatten the nested list:

nested_list = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
flattened = []
for sublist in nested_list:
    for item in sublist:
        flattened.append(item)
# output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

Exercise ix) Conditional Dictionary Comprehension with Nested Loops

Convert the following nested loop into a dictionary comprehension with a condition:

result = {}
for i in range(1,3):
    for j in range(3, 6):
        if j % i != 0:  
            result[(i, j)] = i + j
# {(2, 3): 5, (2, 5): 7}

Exercise x) Filter and Transform Nested Dictionaries

Use a dictionary comprehension to filter and transform the following dictionary of dictionaries:

data = {
    "A": {"score": 90, "passed": True},
    "B": {"score": 65, "passed": False},
    "C": {"score": 75, "passed": True},
    "D": {"score": 50, "passed": False},
}

# Goal: Include only students who passed, and create a dictionary of their scores.
result = {}
for key, value in data.items():
    if value["passed"]:
        result[key] = value["score"]
# output: {'A': 90, 'C': 75}

# Class 13 (December 13, 2024): Introduction to IoT with Raspberry Pi

In this class we use Python to control physical devices through GPIO (general-purpose input/output) ports on a Raspberry Pi microcomputer. We will rely on the gpiozero Python package https://gpiozero.readthedocs.io/en/latest/recipes.html.

Topics of the class:

Raspberry Pi (RPi) and PiOS (Linux)
Retrive local address of the Raspberry Pi with hostname -I
Accessing RPi remotely with ssh (secure shell)
Connecting RPi to a breadboard using the gpio pins
Using the nano text editor to create scripts
Running scripts in RPi with sudo python3 test.py
Implementing some basic recipes from gpio zero documentation that use the following physical devices: leds, buttons, and a line sensor

Exercises with Raspberry Pi, breadboard, led, button and connection wires:

# Class 14 (December 20, 2024): Introduction to IoT with Raspberry Pi (cont'd)

Exercises for led board with gpiozero (cont’d)

LED_board. Interpret the code and verify that it is behaving as expected.
Look at the advanced recipes for LEDboard. Create a “pyramid” of lights 5-3-1-3-5, that turn on and off and pause 1 second. You can build a loop such that the pyramid runs only 4 times and the execution stops.
Adapt the code LED_board.py so if you execute sudo python3 LED_morse.py some_word the LEDs should turn on and off to encode the input word: a dah (-) has a duration of 2 seconds and a dit (.) has a duration of 1 second. After each letter, there should be a 3 second pause before the next letter. The example below should correspond to LEDs 1 and 2 being on for 3 seconds, then LEDs 1, 2 and 3 being on for 3 seconds, then LEDs 1 and 3 being on for 1 second while LED 2 is on for 3 seconds, and so on.

−− −−− ·−· ··· ·       −·−· −−− −·· ·
M   O   R   S  E        C    O   D  E

Other sensors

There are many hardware adapters that make it easier to connect sensors to a microcomputer. Here we look at the Raspberry Pi hat included in the Grove_Base_Kit_for_Raspberry_Pi. The Grove Base Hat for Raspberry Pi provides Digital/Analog/I2C/PWM/UART ports to the RPi allowing it to be connected a large range of modules.

The following code show how to access a temperature and humidity sensor readings programmatically. The sensor is connected to digital port D5. This code also allows access to gpio pin 17 to power a LED.

import time
from seeed_dht import DHT
from gpiozero import LED

led=LED(17)
# Grove - Temperature&Humidity Sensor connected to port D5
sensor = DHT('11', 5)
while True:
    humi, temp = sensor.read()
    print('temperature {}C, humidity {}%'.format(temp, humi))
    if humi > 85:
        led.on()
    else:
        led.off()
    time.sleep(0.5)

Exercises

Adapt the code above such that the LED is on when the temperature is above 24 Celsius or below 20 and is off otherwise.
Interpret the code below. Create a new script that combines the temperature/humidity sensor with the ultrasonic ranger sensor and the LED.

import time
from grove.grove_ultrasonic_ranger import GroveUltrasonicRanger
from gpiozero import LED
led=LED(17)
# Grove - Ultrasonic Ranger connected to port D5
sensor = GroveUltrasonicRanger(5)
while True:
    distance = sensor.get_distance()
    print('{} cm'.format(distance))
    if distance < 20:
         led.on()
         print('LED on')
         time.sleep(0.5)
         led.off()
         print('LED off')
         continue
    time.sleep(1)

greends-ipython

Introduction to Python 2024/2025

Modules

sys.argv

APIs

Problems

Virtual environments in Python

File I/O

Create a Pandas DataFrame from scratch

Reading a csv file, selecting columns by name, selecting rows by condition

Select rows and columns with loc (label-based indexing) and iloc (positional indexing)

Combining positional and label-based indexing

Exporting to file

Use generative AI to help with the following tasks

Use generative AI to help with the following tasks

Use generative AI to help with the following tasks

Simple example of a class and tests for that class

1. The packing/unpacking operators * and **

Exercise i) Summing Arguments with *args

Exercise ii) Concatenate Strings with *args

Exercise iii) Handling Default Keyword Arguments with **kwargs

Exercise iv) Combine *args and **kwargs

Exercise v) Filter Keyword Arguments with **kwargs

2. List and dictionary comprehension, map and filter

Exercise i) Convert a For Loop to List Comprehension

Exercise ii) Filter Numbers with List Comprehension

Exercise iii) Dictionary Comprehension

Exercise iv) Nested Loops with List Comprehension

Exercise v) Conditional Dictionary Comprehension

Exercise vi) Conditional Transformation in List Comprehension

Exercise vii) Dictionary Comprehension with String Keys

Exercise viii) Flatten a Nested List with List Comprehension

Exercise ix) Conditional Dictionary Comprehension with Nested Loops

Exercise x) Filter and Transform Nested Dictionaries

Exercises with Raspberry Pi, breadboard, led, button and connection wires:

Exercises for led board with gpiozero (cont’d)

Other sensors

Exercises

`sys.argv`

Exercise i) Summing Arguments with `*args`

Exercise ii) Concatenate Strings with `*args`

Exercise iii) Handling Default Keyword Arguments with `**kwargs`

Exercise iv) Combine `*args` and `**kwargs`

Exercise v) Filter Keyword Arguments with `**kwargs`