greends-pml

Links and exercises for the course Practical Machine Learning, Green Data Science, 2o semester 2024/2025

Instructor: Manuel Campagnolo, ISA/ULisboa

The course will follow a mixed flipped classroom model, where students are supposed to work on suggested topics autonomously before classes. Work outside class will be based on a range of Machine Learning resources including the book Sebastian Raschka, Yuxi (Hayden) Liu, and Vahid Mirjalili. Machine Learning with PyTorch and Scikit-Learn. Packt Publishing, 2022. During classes, Python notebooks will be run on Google Colab.

Links for class resources:

Fenix webpage. Course official page, where final results will be posted.
Moodle ULisboa. Evaluation: assignments. The course is called Practical Machine Learning. Students need to self-register in the Moodle page for the course.
Kaggle. Access to data; candidate problems for the final project.

Overview notebook This notebook provides an overview of the full course and contains pointers for other sources of relevant information and Python scripts.

Sessions: Each description below includes the summary of the topics covered in the session, as well as the description of assignments and links to videos or other materials that students should work through.

Introduction (Feb 21, 2025)

The goal of the first class is to give an introduction to ML and also to show some of the problems that can be addressed with the techniques and tools that will be discussed during the semester. The examples will be run on Colab.

Types of machine learning problems: supervised learning, unsupervised learning, reinforcement learning. Suggestion: check video Types of machine learning
Supervised learning: classification vs regression
Examples of input data for machine learning problems: tabular data, images, text. See Iris data set example with the notebook iris_regression_classification.ipynb
Example of inference for regression over the Iris data set
Statistics vs Machine Learning: Check video: When to use stats vs. ML?
An example of a prediction task for time series: check the notebook modeling ground water levels for the Kaggle competition Acea Smart Water Analytics. Try to download the data and run the notebook to reproduce the results.

Basic concepts (Feb 28, 2025): model, loss, fit, learning rate, perceptron, ...

The goal of the following classes is to understand how ML models can be trained in and used to solve regression and classification problems. We start by applying the machine learning approach to well-known statistical problems like linear regression to illustrate the stepwise approach followed in ML. We use synthetic data generated from a linear or quadratic regression, where one can control the underlying model and the amout of noise. Then, we consider the Iris tabular data set with 4 explanatory variables and categorical label that can be one of three species.

Video on the Perceptron and early times of AI The First Neural Networks
Basic concepts in Machine learning: model, fit, iterations aka epochs, loss, learning rate, perceptron, parameters weights, for a simple regression problem.
Consider the following pseudo-code to train a simple Linear Regression model. What is the loss function that we aim at minimizing? What is the strategy to reduce the loss in each iteration? Is there a risk of over-fitting?

Pseudo code for SGD (stochastic gradient descent) to fit a linear regression:

Dataset: $D = {(x_1^{(i)}, …, x_n^{(i)}, y^{(i)})}_{i=1}^N$ N observations, n features
Learning rate: $\eta$ Small positive value
Max iterations: max_iter Number of epochs
Initial weights $w$ := $(w_0, w_1, …, w_n)$ Typically, all zero
For iter := 1 to max_iter
- For each $(x_1, …, x_n, y) \in D$ Update weights after each example
  - $\hat{y}$ := $w_0 + w_1 x_1 + w_2 x_2 + \dots + w_n x_n$ Predict response with current weights
  - error := $y-\hat{y}$
  - $w_0$ := $w_0 + \eta \cdot$ error # Update weight (bias)
  - For $j$ := 1 to $n$
    - $w_j$ := $w_j + \eta \cdot$ error $\cdot x_j$ # Update weight (for each feature)

Create a LinearRegression class with a fit method to implement the pseudo code above. Add to your class a predict method to make new predictions using the fitted model. Test your class with the following example.

# Create synthetic data
np.random.seed(0)
X = np.random.rand(100, 1) # array with 100 rows and 1 column (1 feature)
y = 2 + 3 * X + np.random.randn(100, 1) * 0.1
# Create and train the model
model = LinearRegression(learning_rate=0.1, max_iter=1000)
model.fit(X, y)
# Make predictions
X_test = np.array([[0.5]])
y_pred = model.predict(X_test)
print(f"Prediction for X=0.5: {y_pred[0]}")

Create an animation that shows the position of the fitted line for successive epochs for the example above.
How can you adapt the code to address a classification problem where the response $y$ can only be 0 or 1?

Main on-line resources

Basic resources:
- Sebastian Raschka, Yuxi (Hayden) Liu, and Vahid Mirjalili. Machine Learning with PyTorch and Scikit-Learn. Packt Publishing, 2022. See the presentation webpage and GitHub repository
- https://pytorch.org/tutorials/
Tutorials:
- Machine Learning for Beginners (Microsoft); youtube channel
- AI for Beginners (Microsoft)
- NYU course: Data Science for Everyone
- MIT 6.S191: Introduction to Deep Learning (2024)
- Stanford Lecture Collection Convolutional Neural Networks for Visual Recognition (2017) and Notes for the Stanford course on Convolutional Neural Networks for Visual Recognition
- Stanford Machine Learning Full Course led by Andrew Ng (2020). Led by Andrew Ng, this course provides a broad introduction to machine learning and statistical pattern recognition. Topics include: supervised learning (generative/discriminative learning, parametric/non-parametric learning, neural networks, support vector machines); unsupervised learning (clustering, dimensionality reduction, kernel methods); learning theory (bias/variance tradeoffs, practical advice); reinforcement learning and adaptive control.
- Broderick: Machine Learning, MIT 6.036 Fall 2020; Full lecture information and slides

This site is open source. Improve this page.