greends-ipython

Introduction to Python 2025/2026

Masters in Data Science applied to agricultural and food sciences, environment, and forestry engineering.

Instructor: Manuel Campagnolo (mlc@isa.ulisboa.pt)

Teaching assistant: Mekaela Stevenson (mekaela@edu.ulisboa.pt)

Online resources for the course
Course contents: the course will cover some topics in CS50P and PP.fi
CS50P Contents PP.fi Contents
Lecture 0 Creating Code with Python; Functions; Bugs; Strings and Parameters; Formatting Strings; More on Strings; Integers or int; Readability Wins; Float Basics; More on Floats; Def; Returning Values Part 1 Intro; I/O; More about variables; Arithmetic operations; Conditional statements
Lecture 1 Conditionals, if Statements, Control FlowModulo; Creating Our Own Parity Function; Pythonic; match Part 2 Programming terminology; More conditionals; Combining conditions; Simple loops
Lecture 2 Loops; While Loops; For Loops; Improving with User Input; More About Lists; Length; Dictionaries, More on code modularity Part 3 Loops with conditions; Working with strings; More loops; Defining functions
    Part 4 The Visual Studio Code editor, Python interpreter and built-in debugging tool; More functions; Lists; Definite iteration; Print statement formatting; More strings and lists
    Part 5 More lists; References; Dictionary; Tuple
Lecture 3 Exceptions, Runtime Errors, try, else, Creating a Function to Get an Integer, pass Part 6 Reading files; Writing files; Handling errors; Local and global variables
Lecture 4 Libraries, Random, Statistics, Command-Line Arguments, slice, Packages, APIs, Making Your Own Libraries Part 7 Modules; Randomness; Times and dates; Data processing; Creating your own modules; More Python features
Lecture 5 Unit Tests; assert; pytest; Testing Strings; Organizing Tests into Folders    
Lecture 6 File I/O; open; with; CSV; Binary Files and PIL    
Lecture 7 Regular Expressions; Case Sensitivity; Cleaning Up User Input; Extracting User Input    
Lecture 8 Object-Oriented Programming; Classes; raise; Decorators; Class Methods; Static Methods; Inheritance; Inheritance and Exceptions; Operator Overloading Part 8 Objects and methods; Classes and objects; Defining classes; Defining methods; More examples of classes
    Part 9 Objects and references; Objects as attributes; Encapsulation; Scope of methods; Class attributes; More examples with classes
    Part 10 Class hierarchies; Access modifiers; Object oriented programming techniques; Developing a larger application
Lecture 9 set; Global Variables; Constants; Type Hints; Docstrings; argparse; Unpacking; args and kwargs; map; List Comprehensions; filter; Dictionary Comprehensions; enumerate; Generators and Iterators Part 11 List comprehensions; More comprehensions; Recursion; More recursion examples
    Part 12 Functions as arguments; Generators; Functional programming; Regular expressions

Class 1 (September 12, 2025): data types, variables, functions
  1. The recommendation for this class is to code in the CS50 cloud environment (VScode). Two steps: 1. log in into your github account; 2. access your code space at https://cs50.dev/. This environment allows you to test automatically your scripts for the CS50 problem sets.
  2. Some useful keyworks for the command line interface (CLI) in terminal:
    • code filename.py to create a new file
    • ls to list files in folder
    • cp filename newfilename to copy a file, e.g. cp ..\hello.py farewell.py (.. represents parent folder)
    • mv filename newfilename to rename or move file, e.g. my farewell.py goodbye.py or mv farewell.py .. (move one folder up)
    • rm filename to delete (remove) file
    • mkdir foldername to create new folder
    • cd foldername change directory, e.g. cd ..
    • rmdir foldername to delete folder
    • clear to clear terminal window
  3. The REPL (interactive Read-Eval-Print-Loop) environment: see https://realpython.com/interacting-with-python/
  4. All values in Python have a type. The primitive data types are: integer, float, string, Boolean, and None (see https://www.geeksforgeeks.org/python/primitive-data-types-vs-non-primitive-data-types-in-python/)
    • strings (str), variables, print (a function), parameters (e.g. end=), input, comments, formatted strings (f"..."), .strip(), .title (methods)
    • integers (int), operations for integers, casting (e.g. str to int)
    • floating point values (float), round, format floats (e.g. f"{z:.2f})
    • True, False, and, or, not
  5. Functions, def, return
  6. Suggested problems: CS50 Problem set 0

Class 2 (September 19, 2025): conditionals, lists, dictionaries
  1. Conditionals:
    • if, elif, else:
        if score >= 70:
        print("Grade: C to A")
        elif score >= 60:
        print("Grade: D")
        else:
        print("Grade: F")
      
    • match:
       match species:
       case 'versicolor':
           label=0
       case 'virginica'
           label=1
       case _:
           label=2
      
  2. Pythonic coding: def main(), define other functions, call main(). The code must be modular.
  3. While loops, for loops, break, break and return
  4. Data type list []: methods append, extend
  5. Data type dictionary {}, items(), keys .key() and values .values()
    knights = {'gallahad': 'the pure', 'robin': 'the brave'}
    for k, v in knights.items():
        print(k, v)
    if 'gallahad' in knights:
        print('Go Gallahad')
    
  6. Collaborative project: each student or small group of students should define each necessary function to complete the script below. The side effect of main() is a simple histogram printed in the terminal.
     def main():
    # read and sort values
    x=read_values() # x is a list of numbers, either integers or floats
    n=len(x) # integer; number of values
    xmin,xmax=determine_min_max(x) # integers or floats
    # determine number of classes
    m=number_of_classes_sturges(n) # m is a positive integer such that 2**(m-1) <= n <= 2**m
    # determine class amplitude
    delta=amplitude(xmin,xmax,m) # positive float, the range of values divided by the number of classes
    # Compute frequency for each class and plot histogram row by row
    for i in range(m):
      left=xmin+i*delta
      if i < m-1:
        right=left+delta
      else:
        right=left+delta+1 # either 1 or any positive value
      freq=determine_frequency(x,left,right) # integer;  note that each value must belong to one and only one class
      print_frequency(freq) # the output must be '****' where each * represents one observation
     # execute main
     main()
    

    One possible solution for the collaborative project: (https://github.com/isa-ulisboa/greends-ipython/edit/main/collaborative_project_session2.py)

  7. Suggested problems: CS50 Problem set 1. Do not forget about the assignment on Moodle: problems File extensions, Coke machine, Plates

Class 3 (September 26, 2025): exercises, list and dictionary comprehensions, best practices
  1. Exercises on list comprehension (with some solutions): https://github.com/isa-ulisboa/greends-ipython/blob/main/exercises_list_comprehension.md

  2. Exercises on dictionary comprehension (with some solutions): https://github.com/isa-ulisboa/greends-ipython/blob/main/exercises_dict_comprehension.md

  3. Exercises from CS50 Problem set 0, 1 and 2.


Class 4 (October 3, 2025): handling exceptions in Python: catching and raising exceptions

See lecture https://cs50.harvard.edu/python/weeks/3/

  1. A few examples of code that can be helpful to solve problems in CS50 Problem set 3.

Example of basic use of try-except to catch a ValueError:

try:
    x = int(input("What's x?"))
except ValueError:
    print("x is not an integer")
else:
    print(f"x is {x}")

Function for requesting an integer from the user until no exceptions are caught:

def get_int():
    while True:
        try:
            x = int(input("What's x?"))
        except ValueError:
            print("x is not an integer")
        else:
            break
    return x

For a list of Python Built-in Exceptions, besides ValueError, you can check https://www.w3schools.com/python/python_ref_exceptions.asp

  1. The fuel gauge problem (https://cs50.harvard.edu/python/2022/psets/3/fuel/)

To solve this problem, try to organize your code as follows. As suggested in hints, you should catch ValueError and ZeroDivisionError exceptions in your code. In the code below, the user is being asked for correct values for x,y until they satisfy the requirements: x,y must be inputted as a string x/y, x has to be less or equal to y, and y cannot be zero. The function get_string_of_integers_X_less_than_Y in the code below should take care of that.

def main():
    # asks user for input until the input is as expected
    x,y=get_string_of_integers_X_less_than_Y()
    # compute percentage from two integers
    p=compute_percentage(x,y)
    # print output 
    print_gauge(p)
  1. Example from (https://cs50.harvard.edu/python/2022/shorts/handling_exceptions/).

Exercise: adapt the code proposed in the short to be more modular, where the main function is something like the one below:

def main():
    spacecraft = input("Enter a spacecraft: ")
    au=get_au(spacecraft)
    m = convert(au)
    print(f"{m} m")
  1. Other useful applications of try-except
while True:
    try:
        x=int(input())
    except ValueError:
        print('x is not integer')
    except KeyboardInterrupt: #CTRL-C (in Linux, interrupt execution)
        print('\n KeyboardInterrupt')
        break
    except EOFError: #CTRL-D (in Linux, log out terminal/end-of-file)
        print('\n EOFError')
        break
    else:
        print(x)

Exercise (Asking for an haphazard list of numbers): Create a program that asks the user to provide haphazardly a series of numbers that you want to store in a list. The user is asked for a number at the time. Only inputs that are numbers are stored in the list. When the user wants to stop, it should type CTRL-D. Then, the program should print the list of numbers.


Class 5 (October 10, 2025): modules, packages, APIs

See lecture https://cs50.harvard.edu/python/weeks/4/

Modules

Suggestion: watch https://cs50.harvard.edu/python/shorts/creating_modules_packages/

Modules are just python scripts (files like module_name.py) which can be imported into your main code. You can import everything that belongs to the module, or just some given function(s) or other objects.

Create and import your own module

Exercise: Create file named mymodule.py and file main.py in http://cs50.dev. Organize the files in the following folders:

|--- class_5 # or whatever folder name you wish
     |--- modules
          |--- mymodule.py
     |--- main.py

The contents of mymodule.py are typically functions or constants that you can re-use in different contexts. Let’s suppose that mymodule.py has the following contents.

mymodule.py
import sys

Constants={
    'e'   : 2.718281828459045, # Euler's constant
    'pi'  : 3.141592653589793, # Archimedes' constant
    'phi' : 1.618033988749895  # Golden ratio
}

def get_integer() -> int:
    #get integer from user
    while True:
        try:
            return(int(input('Type a number:  ')))
        except ValueError:
            print('Not an integer number: try again')
        except KeyboardInterrupt: #CTRL-C
            print('\n If you want to exit type CTRL-D')
        except EOFError: # CTRL-D
            sys.exit('\n Exit as requested')

def simplify(s: str) -> str:
    #Remove whitespaces from string and convert to lowercase
    return s.strip().lower()

and main.py is the following file:

main.py (1st version)
import modules.mymodule

def main():
    x=modules.mymodule.get_integer()
    print(x)

main()

If you prefer, you can explicitly import some given functions from the module as in the following example.

main.py (2nd version)
from  modules.mymodule import get_integer

def main():
    x=get_integer()
    print(x)

main()

You can also import everything from the module with from modules.mymodule import * instead of the more specific (and recommended) from modules.mymodule import get_integer.

The examples above follow the directory tree that was suggested. If you change the module’s location, you need to adapt he code accordingly. In alternative, you can add the path to the directories where your modules lies to sys.path as in the following example.

import sys
sys.path.append(r'path-to-folder') # folder where mymodule is (e.g. `/workspaces/8834091`)
import mymodule

As explained on the recommended video, a python package is just a folder with modules and a special file named __init__.py

Pip install

Often, you import a module that is available at https://pypi.org/project/pip/. Say you want to load the module random which provides a series of functions for sampling, shuffling, and extracting random numbers from a variety of probability distributions. If the module is not already available, you can typically load it in your terminal with

$pip install random

and then import it on your main script with import random. If you want to know which is the folder where the module is located, you can get that information with random.__file__ as in the following script.

import random
print(random.__file__)

Suggestion: write a script to estimate the value of $\pi$ with a Monte Carlo algorithm that makes calls to random.uniform(-1, 1). One possible solution: https://www.geeksforgeeks.org/dsa/estimating-value-pi-using-monte-carlo/

sys.argv

Previously, we used module sys, in particular functions sys.exit() and sys.path. Another useful function is sys.argv, that allows you to have access to what the user typed in at the command line $ as in the following script.

import sys
print(len(sys.argv)) # returns the number of words in the command line after $python
print(sys.argv[1]) # returns the 2nd word, i.e., the first word after $python myscript.py

For instance, the following script named sum.py prints the sum of two numbers that were specified in the command line.

# sum.py
import sys
try:
    x,y = float(sys.argv[1]), float(sys.argv[2])
    print('the sum is',x+y)
except IndexError:
    print('missing argument')
except ValueError:
    print('The arguments are not numbers')

To run it, you can for instance execute the command $python sum.py 1.2 4.3 in the terminal.

APIs

Suggestion: watch https://cs50.harvard.edu/python/shorts/api_calls/ (13’)

Application program interfaces allow you to communicate with a remote server. For instance, requests is a package that allows your program to behave as a web browser would. Consider the following script myrequest.py that allows you to explore the itunes database (https://performance-partners.apple.com/search-api):

Example: iTunes

import requests
import sys
try:
    response = requests.get("https://itunes.apple.com/search?entity=song&limit=1&term=" + sys.argv[1])
    print(response.json())
except IndexError:
    sys.exit('Missing argument')
except requests.RequestException:
   sys.exit('Request failed')

You can then call the API from your terminal with $python myrequest.py 'name of my favorite band'.

Example: GBIF

You can easily adapt that code to access a different database. For instance if you want to explore the GBIF database (https://data-blog.gbif.org/post/gbif-api-beginners-guide/), you can just replace the main line of code in myrequest.py with

response=requests.get('https://api.gbif.org/v1/species/match?name='+ sys.argv[1])

and execute it with, say, $python myrequest.py Tracheophyta in the terminal.

Example: open-meteo

Another example of a useful API for weather data is https://open-meteo.com/en/docs#api_documentation. You can find a customized requests package for open-meteo at https://pypi.org/project/openmeteo-requests/.

Problems

Solve problems from CS50P Problem_set_4. In particular, for problem Bitcoin price index organize your code so the main function is the following:

def main():
    x=read_command_line_input()
    price=get_bitcoin_price()
    print(f"${x*price:,.4f}")

Class 6 (October 17, 2025): virtual environments; file I/O

Class 7 (October 24, 2025): tabular data; pandas

Create a Pandas DataFrame from scratch

Pandas dataframes have an intrinsic tabular structure represented by rows and columns where each row and column has a unique label (name) and position number inside the dataframe. The row labels, called dataframe indices, can be integer numbers or string values. The column labels, called column names, are usually strings. Use the following script to create a dataframe from a dictionary. Notice the terminology for rows (index) and columns (columns).

import pandas as pd
data = {'Product': ['A', 'B', 'C', 'D', 'E', 'F'],
        'Price': [10, 25, 15, 30, 20, 35],
        'Quantity': [100, 50, 75, 30, 90, 20],
        'Sales': [1000, 1250, 1125, 900, 1800, 700]}
df = pd.DataFrame(data)
df = df.set_index('Product')
display(df)

DataFrame and Series Basics - Selecting Rows and Columns

  1. Print the column names of df with df.columns. Note: .columns returns a pd.Index object. This is to provide extra functionality and performance compared to lists. To extract a list of names, one can use df.columns.to_list(). To get an array, use df.columns.values.
  2. Select columns:
    • Create a Series object that corresponds to column Price with df['Price']
    • Create a new dataframe object that corresponds to columns Price and Quantity with df[['Price','Quantity']].
  3. Select rows with boolean indexing:
    • Create a new dataframe with only products with sales above 80 with display(df[df['Sales'] > 1000])
  4. Select rows and columns with iloc (positional indexing):
    # Select the first row by integer position
    display(df.iloc[0])
    # Select the first two rows and all columns by integer position
    display(df.iloc[0:2, :])
    # Select rows from index 1 to 3 (inclusive of 1, exclusive of 4)
    # and columns from index 0 to 2 (exclusive of 2) by integer position
    display(df.iloc[1:4, 0:2])
    # Select a specific cell by integer position (row index 2, column index 1)
    display(df.iloc[2, 1])
    
  5. Select rows and columns with loc (label-based indexing):
    # Select a single row by its label
    display(df.loc['a'])
    # Select multiple rows by their labels
    display(df.loc[['a', 'c', 'e']])
    # Select rows by label and specific columns by label
    display(df.loc[['a', 'c', 'e'], ['Price', 'Sales']])
    # Select a slice of rows by label (inclusive of both start and stop labels)
    display(df.loc['b':'e'])
    # Select rows by label slice and columns by label slice
    display(df.loc['b':'e', 'Quantity':'Sales'])
    

Read csv file

Consider the dataset that described 517 fires from the Montesinho natural park in Portugal. For each incident weekday, month, coordinates, and the burnt area are recorded, as well as several meteorological data such as rain, temperature, humidity, and wind (https://www.kaggle.com/datasets/vikasukani/forest-firearea-datasets). For reference, a copy of the file is available forestfires.csv. The variables are:

Explore the dataset with Pandas

  1. Read the file with pd.read_csv into a new object fires, and show the first 10 rows with fires.head(10).
  2. Create list of column names and determine column data types with attribute .dtypes.
  3. Print a summary of the dataframe with .info().
  4. Create a Series with the temperature values for all 517 fires.
  5. Create a DataFrame just with columns month and day.
  6. Select fires for which the temperature is higher than 25 Celsius, and between 20 and 25 Celsius; note that each condition needs to be surrounded by (...) and can be connected with & or | or negated with ~.
  7. Select fires that occured on weekends; use the conditional function .isin()
  8. Check if there are no Null values in the dataframe with .notna(). You can sum along columns with .sum().
  9. Use iloc to select the first 20 fires and just the FWI based variables values.
  10. Use loc and is.in() to select fires from August and September and just FWI based variables values for those fires.
  11. Create a dataframe months_df from a dictionary: for instance create a dictionary where keys are jan, feb, mar, for all 12 months, and the values are January, February, March and so on.
    month_data = {
        'Month': [
            'January', 'February', 'March', 'April', 'May', 'June', 
            'July', 'August', 'September', 'October', 'November', 'December'
        ],
        'mth': [
            'jan', 'feb', 'mar', 'apr', 'may', 'jun', 
            'jul', 'aug', 'sep', 'oct', 'nov', 'dec'
        ]
    }
    months_df = pd.DataFrame(month_data)
    
  12. Merge with new dataframe to get a new variable that contains the full name of the month. See (https://pandas.pydata.org/docs/user_guide/merging.html)
    merged_df = pd.merge(fires, months_df, left_on='month', right_on='mth', how='left')
    merged_df.drop(columns='mth', inplace=True)
    
  13. Create a dataframe named firesbymonth with columns avg_temp (average temperature for fires in that month), avg_RH (idem, for humidity) and fire_count (number of fires). Towards that end, reduce the fires dataframe with method .groupby to get just one row per month, and average temperature, average RH, and number of fires per month. See (https://pandas.pydata.org/docs/user_guide/groupby.html)
  14. What is the effect of adding the method .reset_index() to the previous command?
  15. Sort the dataframe firesbymonth, such that the 12 rows are ordered by month correctly: jan, feb, mar, and so on.
  16. Create a new column called conditions in firesbymonth of type string that indicates if a month is dry&hot, dry&cold, wet&hot or wet&cold. Use the mean values of avg_temp and avg_RH to establish the appropriate thresholds. Use method .apply and define the function to apply with lambda.
  17. Re-organize the information in fires into a two-way table that shows the total area of fires per day of the week and per month, where NaN are replaced by 0. Towards that end, explore the .pivot_table method.

Combining positional and label-based indexing

There are several possibilities to combine positional and label-based indexing:

  1. (with iloc) Using df.columns.get_loc() which converts the name of one column into its position. Then iloc can be used to perform the selection. For multiple columns determined by a list of column names, one can use instead df.columns.get_indexer(). Example: Use iloc to select the first 20 fires and just the FWI based variables values, using the names rather than the positions of those variables. Solution: FWI_positions=fires.columns.get_indexer(['FFMC','DMC','DC','ISI']) and ` fires.iloc[0:20,FWI_positions]`
  2. (with loc) Using df.index[] to extract the index names. Then, loc can be used to perform the selection. Solution: fires.loc[fires.index[0:20], ['FFMC', 'DMC', 'DC', 'ISI']].

Exporting to file

Exporting is done with operations named .to_... as listed in (https://pandas.pydata.org/docs/user_guide/io.html)

  1. Export your file as an Excel spreadsheet with .to_excel("filename.xlsx", sheetname="fires", index=False)
  2. Read an Excel spreadsheet with: pd.read_excel("filename.xlsx", sheetname="fires", index=False)