Masters in Data Science applied to agricultural and food sciences, environment, and forestry engineering.
Instructor: Manuel Campagnolo (mlc@isa.ulisboa.pt)
Teaching assistant: Mekaela Stevenson (mekaela@edu.ulisboa.pt)
>>> import this and many other anecdotes about Python.| CS50P | Contents | PP.fi | Contents | 
|---|---|---|---|
| Lecture 0 | Creating Code with Python; Functions; Bugs; Strings and Parameters; Formatting Strings; More on Strings; Integers or int; Readability Wins; Float Basics; More on Floats; Def; Returning Values | Part 1 | Intro; I/O; More about variables; Arithmetic operations; Conditional statements | 
| Lecture 1 | Conditionals, if Statements, Control FlowModulo; Creating Our Own Parity Function; Pythonic; match | Part 2 | Programming terminology; More conditionals; Combining conditions; Simple loops | 
| Lecture 2 | Loops; While Loops; For Loops; Improving with User Input; More About Lists; Length; Dictionaries, More on code modularity | Part 3 | Loops with conditions; Working with strings; More loops; Defining functions | 
| Part 4 | The Visual Studio Code editor, Python interpreter and built-in debugging tool; More functions; Lists; Definite iteration; Print statement formatting; More strings and lists | ||
| Part 5 | More lists; References; Dictionary; Tuple | ||
| Lecture 3 | Exceptions, Runtime Errors, try, else, Creating a Function to Get an Integer, pass | Part 6 | Reading files; Writing files; Handling errors; Local and global variables | 
| Lecture 4 | Libraries, Random, Statistics, Command-Line Arguments, slice, Packages, APIs, Making Your Own Libraries | Part 7 | Modules; Randomness; Times and dates; Data processing; Creating your own modules; More Python features | 
| Lecture 5 | Unit Tests; assert; pytest; Testing Strings; Organizing Tests into Folders | ||
| Lecture 6 | File I/O; open; with; CSV; Binary Files and PIL | ||
| Lecture 7 | Regular Expressions; Case Sensitivity; Cleaning Up User Input; Extracting User Input | ||
| Lecture 8 | Object-Oriented Programming; Classes; raise; Decorators; Class Methods; Static Methods; Inheritance; Inheritance and Exceptions; Operator Overloading | Part 8 | Objects and methods; Classes and objects; Defining classes; Defining methods; More examples of classes | 
| Part 9 | Objects and references; Objects as attributes; Encapsulation; Scope of methods; Class attributes; More examples with classes | ||
| Part 10 | Class hierarchies; Access modifiers; Object oriented programming techniques; Developing a larger application | ||
| Lecture 9 | set; Global Variables; Constants; Type Hints; Docstrings; argparse; Unpacking; args and kwargs; map; List Comprehensions; filter; Dictionary Comprehensions; enumerate; Generators and Iterators | Part 11 | List comprehensions; More comprehensions; Recursion; More recursion examples | 
| Part 12 | Functions as arguments; Generators; Functional programming; Regular expressions | 
code filename.py to create a new filels to list files in foldercp filename newfilename to copy a file, e.g. cp ..\hello.py  farewell.py (.. represents parent folder)mv filename newfilename to rename or move file, e.g. my farewell.py goodbye.py or mv farewell.py .. (move one folder up)rm filename to delete (remove) filemkdir foldername to create new foldercd foldername change directory, e.g. cd ..rmdir foldername to delete folderclear to clear terminal windowstr), variables, print (a function), parameters (e.g. end=), input, comments, formatted strings (f"..."), .strip(), .title (methods)int), operations for integers, casting (e.g. str to int)float), round, format floats (e.g. f"{z:.2f})True, False, and, or, notdef, returnif, elif, else:
            if score >= 70:
  print("Grade: C to A")
  elif score >= 60:
  print("Grade: D")
  else:
  print("Grade: F")
        match:
           match species:
 case 'versicolor':
     label=0
 case 'virginica'
     label=1
 case _:
     label=2
        def main(), define other functions, call main(). The code must be modular.break, break and return[]: methods append, extend{}, items(), keys .key() and values .values()
      knights = {'gallahad': 'the pure', 'robin': 'the brave'}
for k, v in knights.items():
    print(k, v)
if 'gallahad' in knights:
    print('Go Gallahad')
    main() is a simple histogram printed in the terminal.
       def main():
# read and sort values
x=read_values() # x is a list of numbers, either integers or floats
n=len(x) # integer; number of values
xmin,xmax=determine_min_max(x) # integers or floats
# determine number of classes
m=number_of_classes_sturges(n) # m is a positive integer such that 2**(m-1) <= n <= 2**m
# determine class amplitude
delta=amplitude(xmin,xmax,m) # positive float, the range of values divided by the number of classes
# Compute frequency for each class and plot histogram row by row
for i in range(m):
  left=xmin+i*delta
  if i < m-1:
    right=left+delta
  else:
    right=left+delta+1 # either 1 or any positive value
  freq=determine_frequency(x,left,right) # integer;  note that each value must belong to one and only one class
  print_frequency(freq) # the output must be '****' where each * represents one observation
 # execute main
 main()
      One possible solution for the collaborative project: (https://github.com/isa-ulisboa/greends-ipython/edit/main/collaborative_project_session2.py)
Exercises on list comprehension (with some solutions): https://github.com/isa-ulisboa/greends-ipython/blob/main/exercises_list_comprehension.md
Exercises on dictionary comprehension (with some solutions): https://github.com/isa-ulisboa/greends-ipython/blob/main/exercises_dict_comprehension.md
Exercises from CS50 Problem set 0, 1 and 2.
See lecture https://cs50.harvard.edu/python/weeks/3/
Example of basic use of try-except to catch a ValueError:
try:
    x = int(input("What's x?"))
except ValueError:
    print("x is not an integer")
else:
    print(f"x is {x}")
Function for requesting an integer from the user until no exceptions are caught:
def get_int():
    while True:
        try:
            x = int(input("What's x?"))
        except ValueError:
            print("x is not an integer")
        else:
            break
    return x
For a list of Python Built-in Exceptions, besides ValueError, you can check https://www.w3schools.com/python/python_ref_exceptions.asp
To solve this problem, try to organize your code as follows. As suggested in hints, you should catch ValueError and  ZeroDivisionError exceptions in your code. In the code below, the user is being asked for correct values for x,y until they satisfy the requirements: x,y must be inputted as a string x/y, x has to be less or equal to y, and y cannot be zero. The function get_string_of_integers_X_less_than_Y in the code below should take care of that.
def main():
    # asks user for input until the input is as expected
    x,y=get_string_of_integers_X_less_than_Y()
    # compute percentage from two integers
    p=compute_percentage(x,y)
    # print output 
    print_gauge(p)
Exercise: adapt the code proposed in the short to be more modular, where the main function is something like the one below:
def main():
    spacecraft = input("Enter a spacecraft: ")
    au=get_au(spacecraft)
    m = convert(au)
    print(f"{m} m")
sys.exit(), which can also be used to print a message.
      import sys # import module
try:
  x = int(input("What's x?"))
except ValueError:
  sys.exit("x is not an integer")
CRTL-C or CRTL-D:while True:
    try:
        x=int(input())
    except ValueError:
        print('x is not integer')
    except KeyboardInterrupt: #CTRL-C (in Linux, interrupt execution)
        print('\n KeyboardInterrupt')
        break
    except EOFError: #CTRL-D (in Linux, log out terminal/end-of-file)
        print('\n EOFError')
        break
    else:
        print(x)
Exercise (Asking for an haphazard list of numbers): Create a program that asks the user to provide haphazardly a series of numbers that you want to store in a list. The user is asked for a number at the time. Only inputs that are numbers are stored in the list. When the user wants to stop, it should type CTRL-D. Then, the program should print the list of numbers.
See lecture https://cs50.harvard.edu/python/weeks/4/
Suggestion: watch https://cs50.harvard.edu/python/shorts/creating_modules_packages/
Modules are just python scripts (files like module_name.py) which can be imported into your main code. You can import everything that belongs to the module, or just some given function(s) or other objects.
Exercise: Create file named mymodule.py and file main.py in http://cs50.dev. Organize the files in the following folders:
|--- class_5 # or whatever folder name you wish
     |--- modules
          |--- mymodule.py
     |--- main.py
The contents of mymodule.py are typically functions or constants that you can re-use in different contexts. Let’s suppose that mymodule.py has the following contents.
import sys
Constants={
    'e'   : 2.718281828459045, # Euler's constant
    'pi'  : 3.141592653589793, # Archimedes' constant
    'phi' : 1.618033988749895  # Golden ratio
}
def get_integer() -> int:
    #get integer from user
    while True:
        try:
            return(int(input('Type a number:  ')))
        except ValueError:
            print('Not an integer number: try again')
        except KeyboardInterrupt: #CTRL-C
            print('\n If you want to exit type CTRL-D')
        except EOFError: # CTRL-D
            sys.exit('\n Exit as requested')
def simplify(s: str) -> str:
    #Remove whitespaces from string and convert to lowercase
    return s.strip().lower()
    and main.py is the following file:
import modules.mymodule
def main():
    x=modules.mymodule.get_integer()
    print(x)
main()
    If you prefer, you can explicitly import some given functions from the module as in the following example.
from  modules.mymodule import get_integer
def main():
    x=get_integer()
    print(x)
main()
    You can also import everything from the module with from modules.mymodule import * instead of the more specific (and recommended) from modules.mymodule import get_integer.
The examples above follow the directory tree that was suggested. If you change the module’s location, you need to adapt he code accordingly. In alternative, you can add the path to the directories where your modules lies to sys.path as in the following example.
import sys
sys.path.append(r'path-to-folder') # folder where mymodule is (e.g. `/workspaces/8834091`)
import mymodule
  As explained on the recommended video, a python package is just a folder with modules and a special file named __init__.py
Often, you import a module that is available at https://pypi.org/project/pip/. Say you want to load the module random which provides a series of functions for sampling, shuffling, and extracting random numbers from a variety of probability distributions. If the module is not already available, you can typically load it in your terminal with
$pip install random
and then import it on your main script with import random. If you want to know which is the folder where the module is located, you can get that information with random.__file__ as in the following script.
import random
print(random.__file__)
    Suggestion: write a script to  estimate the value of $\pi$ with a Monte Carlo algorithm that makes calls to random.uniform(-1, 1). One possible solution: https://www.geeksforgeeks.org/dsa/estimating-value-pi-using-monte-carlo/
Previously, we used module sys, in particular functions  sys.exit() and  sys.path. Another useful function is sys.argv,  that allows you to have access to what the user typed in at the command line $ as in the following script.
import sys
print(len(sys.argv)) # returns the number of words in the command line after $python
print(sys.argv[1]) # returns the 2nd word, i.e., the first word after $python myscript.py
  For instance, the following script named sum.py prints the sum of two numbers that were specified in the command line.
# sum.py
import sys
try:
    x,y = float(sys.argv[1]), float(sys.argv[2])
    print('the sum is',x+y)
except IndexError:
    print('missing argument')
except ValueError:
    print('The arguments are not numbers')
  To run it, you can for instance execute the command $python sum.py 1.2 4.3 in the terminal.
Suggestion: watch https://cs50.harvard.edu/python/shorts/api_calls/ (13’)
Application program interfaces allow you to communicate with a remote server. For instance,  requests is a package that allows your program to behave as a web browser would.  Consider the following script myrequest.py that allows you to explore the itunes database (https://performance-partners.apple.com/search-api):
import requests
import sys
try:
    response = requests.get("https://itunes.apple.com/search?entity=song&limit=1&term=" + sys.argv[1])
    print(response.json())
except IndexError:
    sys.exit('Missing argument')
except requests.RequestException:
   sys.exit('Request failed')
You can then call the API from your terminal with $python myrequest.py 'name of my favorite band'.
You can easily adapt that code to access a different database. For instance if you want to explore the GBIF database (https://data-blog.gbif.org/post/gbif-api-beginners-guide/), you can just replace the main line of code in myrequest.py with
response=requests.get('https://api.gbif.org/v1/species/match?name='+ sys.argv[1])
and execute it with, say,  $python myrequest.py Tracheophyta in the terminal.
Another example of a useful API for weather data is https://open-meteo.com/en/docs#api_documentation. You can find a customized requests package for open-meteo  at https://pypi.org/project/openmeteo-requests/.
Solve problems from CS50P Problem_set_4. In particular, for problem Bitcoin price index organize your code so the main function is the following:
def main():
    x=read_command_line_input()
    price=get_bitcoin_price()
    print(f"${x*price:,.4f}")
A virtual environment (https://docs.python.org/3/library/venv.html) is:
.venv or venv in the project directory, or under a container directory for many virtual environments.
    - Not checked into source control systems such as Git.
    - Considered as disposable – it should be simple to delete and recreate it from scratch. You don’t place any project code in the environment.
    - Not considered as movable or copyable – you just recreate the same environment in the target location.In your system you have the base environment by default, and you can create one or more virtual environments. Below, we describe how to create a virtual environment and how to activate it in Python, so your commands at the terminal are interpreted within that environment. That allows you to encapsulate in each virtual environment you create a given Python version, and a set of Python packages with their given versions.
Your data and script files remain on the usual working folders: they should not be moved to the folders where the virtual environment files are stored.
The following commands work in the  CS50 codespace that runs Linux (check with $cat /etc/os-release in the terminal). Some need to be slightly adapted for Windows (check differences for instance at https://realpython.com/python-virtual-environments-a-primer/).
Firstly, let’s check what are the available packages and their versions in the base environment, and also let’s get extra information about the package requests (e.g. dependencies):
$ pip list 
$ pip show requests
Next, let’s create a virtual environment.
One can first create (with mkdir) a folder called, say, myproject that contains our project.
Then create a subfolder .venv for the virtual environment(s). This folder should be separated from the working folders that contain data and scripts.
myproject can then be created with:
          .venv/ $ python3 -m venv myprojectvenv # creates environment called 'myprojectvenv' with Python 3
In case one needs to delete the virtual environment, one just needs to delete the folder. This can be done with .venv/ $ sudo rm -rf myprojectvenv in the terminal (Linux).
activate which lies in the bin folder of the virtual environment:.venv/ $ source myprojectvenv/bin/activate # note that activate needs to be sourced
As a result, the prompt shows (myprojectvenv) .venv/ $ which indicates that myprojectvenv is now activated. One can check the Python version with $python -V. To de-activate a virtual environment, the command is $ deactivate.
(myprojectvenv) .venv/ $ pip install random11==0.0.1
(myprojectvenv) .venv/ $ pip install geopy==1.23.0
(myprojectvenv) .venv/ $ pip install requests==2.25.0
Some of this packages depend on additional packages that are installed automatically.
To list all instaled packages within the environment myprojectvenv one can execute  (myprojectvenv) $ pip list as before. Compare the version of requests in myprojectvenv with the version returned initially in the base environment: this one is 2.25.0 while the one in the base environment is more recent. One can also check where requests is installed in myprojectvenv with the command  (myprojectvenv) $ pip show requests.
print(sys.path): one can do this from the terminal with the command
          (myprojectvenv) .venv/ $ python -c 'import sys; print(sys.path)'
Notice that the folder in myprojectvenv where the virtual environment packages are installed is listed, but the path to where base packages are stored is not.
requirements.txt) that allows a collaborator to re-create the environment. The file requirements.txt stores the information about the installed packages in a file in case one intends to share the environment (e.g. in GitHub). Towards that end, one needs to create requirements.txt with the packages names and versions, that can be used to create a clone of the environment on another machine. This is done, still within myprojectvenv (i.e. with myprojectvenv activated) with the following command:
          (myprojectvenv) .venv/ $ pip freeze > requirements.txt  
Note that the file requirements.txt is created in the folder that contains myprojectvenv and not within myprojectvenv itself: this makes sense, since one does not want to store scripts or data within myprojectvenv but just packages and the Python installation.
requirements.txt is now available, one can create a copy of myprojectvenv called, say, myprojectvenv2. Firstly, one needs to de-activate myprojectvenv with $ deactivate. Then, the commands to be executed in the terminal are:
          .venv/ $ python3 - m venv myproject2venv # create new virtual environment 
.venv/ $ source myproject2venv/bin/activate # activate myproject2venv
(myproject2venv) .venv/ $ pip install -r requirements.txt # install packages and versions listed in requirements.txt
Exercise: go back to myprojectvenv, add package (say, emoji==0.1.0), re-build requirements.txt, and create new environment myproject3venv and install the  set of packages listed in the new requirements.txt.
As discussed in (https://cs50.harvard.edu/python/2022/notes/6/) open is a functionality built into Python that allows you to open a file and utilize it in your program. The open function allows you to open a file such that you can read from it or write to it. The most basic way to use open allow us to enable file I/O with respect to a given file. In the example below, w is the argument value that indicates that the file is open in writing mode. The instruction file.write(...) will entirely rewrite the file, deleting the previous contents.
name='Bob'
file = open("names.txt", "w")
file.write(name)
file.close()
As an alternative, if the goal is to add new contents to the file, which is appended to the existent content, then w should be replaced by a (append). Each call to file.write(name) will then add the value of name to the end of file.
Instead of explicitly opening and closing a file, it’s simpler to use the so-called context manager in Python, using the keyword with, which automatically closes the file:
with open("names.txt", "w") as f:
  f.write(name)
If one wishes to read from a file, then the file has to be opened in reading mode as in the following example. The method readlines reads all lines of the file, and stores them in a list, where each element of the list is the contents of the corresponding line.
with open("names.txt", "r") as f:
  L=f.readlines(name)
However, it is possible to read one line at the time:
with open('myfile.txt','r') as f:
    N=0
    for line in f:
        N+=1
print('number of lines', N)
A file can be of type text (human-readable) or binary. Binary files like images for instance are read with with open('myfile.txt','rb') as f.
Exercise: Consider the file https://github.com/isa-ulisboa/greends-ipython/blob/main/INE_permanent_crops.csv downloaded from the Portuguese Institute of Statistics, INE, about the area of two permanent crops (olive plantations, vineyard) for the main regions of Portugal. The data is not structured as a rectangular table: specifically it contains rows that we want to ignore. We are just interested in filtering the rows that have the same number of separators, namely the column names and the rows that contain the crop areas for each region. The resulting rectangular table is to be exported to a new file.
Write the code using the template below.
def main():
    # constants
    input_file='INE_permanent_crops.csv'
    output_file='output.csv'
    sep=';'
    number_sep=6
    file_encoding='ISO-8859-1'
    # main steps
    L=read_file(input_file,file_encoding) # L is a list of the rows of the file
    L=filter_lines(L,sep,number_sep) # L is a list of lists, after we apply the separator
    write_to_csv(L,output_file,sep)
You should complete the definitions of the following functions.
def read_file(file_name,file_encoding):
    ''' reads file using the appropriate encoding and returns list of rows'''
    with open(file_name,"r", encoding=file_encoding) as f:
        lines=f.readlines()
    return lines
def filter_lines(L,sep,number_sep):
    '''
    Filter only elements of L that contains number_sep times the separator 'sep'.
    Each filtered element of L is represented as a list of strings, the strings separated by 'sep'
    All list of strings have the same length (number_sep+1)
    The output is the list with just the filtered lists of strings
    '''
    newL=[]
    for line in L:
        row=line.rstrip().split(sep)
        ...
    return newL
  def write_to_csv(L, output_file,sep):
    '''writes each element of L as a line in the output file'''
    with open(output_file, "w") as f:
        for row in L:
           ...
  main()
Tutorial: https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html
Video sugestions:
Pandas dataframes have an intrinsic tabular structure represented by rows and columns where each row and column has a unique label (name) and position number inside the dataframe. The row labels, called dataframe indices, can be integer numbers or string values. The column labels, called column names, are usually strings. Use the following script to create a dataframe from a dictionary. Notice the terminology for rows (index) and columns (columns).
import pandas as pd
data = {'Product': ['A', 'B', 'C', 'D', 'E', 'F'],
        'Price': [10, 25, 15, 30, 20, 35],
        'Quantity': [100, 50, 75, 30, 90, 20],
        'Sales': [1000, 1250, 1125, 900, 1800, 700]}
df = pd.DataFrame(data)
df = df.set_index('Product')
display(df)
df with df.columns. Note: .columns returns a pd.Index object. This is to provide extra functionality and performance compared to lists. To extract a list of names, one can use  df.columns.to_list(). To get an array, use df.columns.values.Series object that corresponds to column Price with df['Price']Price and Quantity with df[['Price','Quantity']].display(df[df['Sales'] > 1000])iloc (positional indexing):
      # Select the first row by integer position
display(df.iloc[0])
# Select the first two rows and all columns by integer position
display(df.iloc[0:2, :])
# Select rows from index 1 to 3 (inclusive of 1, exclusive of 4)
# and columns from index 0 to 2 (exclusive of 2) by integer position
display(df.iloc[1:4, 0:2])
# Select a specific cell by integer position (row index 2, column index 1)
display(df.iloc[2, 1])
loc (label-based indexing):
      # Select a single row by its label
display(df.loc['a'])
# Select multiple rows by their labels
display(df.loc[['a', 'c', 'e']])
# Select rows by label and specific columns by label
display(df.loc[['a', 'c', 'e'], ['Price', 'Sales']])
# Select a slice of rows by label (inclusive of both start and stop labels)
display(df.loc['b':'e'])
# Select rows by label slice and columns by label slice
display(df.loc['b':'e', 'Quantity':'Sales'])
Consider the dataset that described 517 fires from the Montesinho natural park in Portugal. For each incident weekday, month, coordinates, and the burnt area are recorded, as well as several meteorological data such as rain, temperature, humidity, and wind (https://www.kaggle.com/datasets/vikasukani/forest-firearea-datasets). For reference, a copy of the file is available forestfires.csv. The variables are:
pd.read_csv into a new object fires, and show the first 10 rows with fires.head(10)..dtypes..info().Series with the temperature values for all 517 fires.DataFrame just with columns month and day.(...) and can be connected with & or | or negated with ~..isin()Null values in the dataframe with .notna(). You can sum along columns with .sum().iloc to select the first 20 fires and just the FWI based variables values.loc and is.in() to select fires from August and September and just FWI based variables values for those fires.months_df from a dictionary: for instance create a dictionary where keys are jan, feb, mar, for all 12 months, and the values are January, February, March and so on.
      month_data = {
    'Month': [
        'January', 'February', 'March', 'April', 'May', 'June', 
        'July', 'August', 'September', 'October', 'November', 'December'
    ],
    'mth': [
        'jan', 'feb', 'mar', 'apr', 'may', 'jun', 
        'jul', 'aug', 'sep', 'oct', 'nov', 'dec'
    ]
}
months_df = pd.DataFrame(month_data)
merged_df = pd.merge(fires, months_df, left_on='month', right_on='mth', how='left')
merged_df.drop(columns='mth', inplace=True)
firesbymonth with columns avg_temp (average temperature for fires in that month), avg_RH (idem, for humidity) and fire_count (number of fires). Towards that end, reduce the fires dataframe with method .groupby to get just one row per month, and average temperature, average RH, and number of fires per month. See (https://pandas.pydata.org/docs/user_guide/groupby.html).reset_index() to the previous command?firesbymonth, such that the 12 rows are ordered by month correctly: jan, feb, mar, and so on.conditions in firesbymonth of type string that indicates if a month is dry&hot, dry&cold, wet&hot or wet&cold. Use the mean values of avg_temp and avg_RH to establish the appropriate thresholds. Use method .apply and define the function to apply with lambda.fires into a two-way table that shows the total area of fires per day of the week and per month, where NaN are replaced by 0. Towards that end, explore the .pivot_table method.There are several possibilities to combine positional and label-based indexing:
iloc) Using df.columns.get_loc() which converts the name of one column into its position. Then iloc can be used to perform the selection. For multiple columns determined by a list of column names, one can use instead df.columns.get_indexer(). Example: Use iloc to select the first 20 fires and just the FWI based variables values, using the names rather than the positions of those variables. Solution: FWI_positions=fires.columns.get_indexer(['FFMC','DMC','DC','ISI']) and `
fires.iloc[0:20,FWI_positions]`loc) Using df.index[] to extract the index names. Then, loc can be used to perform the selection. Solution: fires.loc[fires.index[0:20], ['FFMC', 'DMC', 'DC', 'ISI']].Exporting is done with operations named .to_... as listed in (https://pandas.pydata.org/docs/user_guide/io.html)
.to_excel("filename.xlsx", sheetname="fires", index=False)pd.read_excel("filename.xlsx", sheetname="fires", index=False)