1. About Python#
1.1. Overview#
This lecture series will teach you to use Python for engineering and scientific computing.
In this lecture we will
introduce Python,
showcase some of its abilities,
discuss the connection between Python and AI,
explain why Python is our favorite language for scientific computing, and
point you to the next steps.
You do not need to understand everything you see in this lecture – we will work through the details later in the lecture series.
1.1.1. Isn’t MATLAB Better?#
No, no, and one hundred times no.
For almost all modern problems, Python’s scientific libraries are now far in advance of MATLAB’s capabilities.
We will explain the benefits of Python’s libraries throughout this lecture series.
We will also explain how Python’s elegant design helps you write clean, efficient code.
On top of these features, Python is more widely used, with a huge and helpful community, and free!
1.2. What’s Python?#
Python is a general-purpose programming language conceived in 1989 by Guido van Rossum.
Python is a popular programming language available on all major computer platforms including macOS, Linux, and Windows. It is a scripting language which means that the moment the user presses the Return key or Run, the Python software interprets and runs the code. This is in contrast to a compiled language like C where the code must first be translated into binary (i.e., machine language) before it can be run. On-the-fly interpretation makes Python quick to use and often provides the user with rapid results. This is ideal for scientific data analysis where the user is routinely making changes to the processing and visualization of the data.
Python is free and open source, with development coordinated through the Python Software Foundation.
Another reason to use Python over other options, free or otherwise, is the power and the community support available to Python users. Python is a common and popular programming language that has been applied to a wide variety of applications including data analysis, visualization, machine learning, robotics, web scraping, 3D graphics, and more. As a result, there is a large community built around Python that provides valuable support for those who need assistance. If you are stuck on a problem or have a question, a quick internet search will likely provide the answer.
This is important because it
saves us money,
means that Python is controlled by the community of users rather than a for-profit corporation, and
encourages reproducibility and open science.
1.2.1. Common Uses#
Python is a general-purpose language used in almost all application domains, including
Artificial Inteligence (AI)
scientific and engineering computing
communication
web development
CGI and graphical user interfaces
game development
resource planning
multimedia
etc.
It is used and supported extensively by tech firms including
1.2.2. Relative Popularity#
Python is, without doubt, one of the most popular programming languages.
Python libraries like pandas and Polars are replacing familiar tools like Excel and VBA as an essential skill in the fields of data science.
Moreover, Python is extremely popular within the scientific community – especially AI
The following chart, produced using Stack Overflow Trends, provides some evidence.
It shows the popularity of a Python AI library called PyTorch relative to MATLAB.
The chart shows that MATLAB’s popularity has faded, while PyTorch is growing rapidly.
Moreover, PyTorch is just one of the thousands of Python libraries available for scientic computing.
1.2.3. Features#
Python is a high-level language, which means it is relatively easy to read, write and debug.
It has a relatively small core language that is easy to learn.
This core is supported by many libraries, which you can learn to use as required.
Python is very beginner-friendly
suitable for students learning programming
used in many undergraduate and graduate programs
Other features of Python:
multiple programming styles are supported (procedural, object-oriented, functional, etc.)
interpreted rather than compiled ahead of time.
1.2.4. Syntax and Design#
One reason for Python’s popularity is its simple and elegant design — we’ll see many examples later on.
To get a feeling for this, let’s look at an example.
The code below is written in Java rather than Python.
You do not need to read and understand this code!
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class CSVReader {
public static void main(String[] args) {
String filePath = "data.csv";
String line;
String splitBy = ",";
int columnIndex = 1;
double sum = 0;
int count = 0;
try (BufferedReader br = new BufferedReader(new FileReader(filePath))) {
while ((line = br.readLine()) != null) {
String[] values = line.split(splitBy);
if (values.length > columnIndex) {
try {
double value = Double.parseDouble(
values[columnIndex]
);
sum += value;
count++;
} catch (NumberFormatException e) {
System.out.println(
"Skipping non-numeric value: " +
values[columnIndex]
);
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
if (count > 0) {
double average = sum / count;
System.out.println(
"Average of the second column: " + average
);
} else {
System.out.println(
"No valid numeric data found in the second column."
);
}
}
}
This Java code opens an imaginary file called data.csv
and computes the mean
of the values in the second column.
Even without knowing Java, you can see that the program is long and complex.
Here’s Python code that does the same thing.
Even if you don’t yet know Python, you can see that the code is simpler and easier to read.
import csv
total, count = 0, 0
with open(data.csv, mode='r') as file:
reader = csv.reader(file)
for row in reader:
try:
total += float(row[1])
count += 1
except (ValueError, IndexError):
pass
print(f"Average: {total / count if count else 'No valid data'}")
The simplicity of Python and its neat design are a big factor in its popularity.
1.3. Some introductory examples#
1.3.1. grade_statistics.py - Basic Syntaxes and Constructs#
This example repeatably prompts user for grade (between 0 and 100 with input validation). It then computes the sum, average, minimum, and print the horizontal histogram.
This example illustrates the basic Python syntaxes and constructs, such as comment, statement, block indentation, conditional if-else, for-loop, while-loop, input/output, string, list and function.
"""
grade_statistics - Grade statistics
-----------------------------------
Prompt user for grades (0-100 with input validation) and compute the sum, average,
minimum, and print the horizontal histogram.
An example to illustrate basic Python syntaxes and constructs, such as block indentation,
conditional, for-loop, while-loop, input/output, list and function.
Usage: ./grade_statistics.py (Unix/macOS)
python grade_statistics.py (All Platforms)
"""
# Define all the functions before using them
def my_sum(lst):
"""Return the sum of the given list."""
sum = 0
for item in lst: sum += item
return sum
def my_average(lst):
"""Return the average of the given list."""
return my_sum(lst)/len(lst) # float
def my_min(lst):
"""Return the minimum of the given lst."""
min = lst[0]
for item in lst:
if item < min: # Parentheses () not needed for test
min = item
return min
def print_histogram(lst):
"""Print the horizontal histogram."""
# Create a list of 10 bins to hold grades of 0-9, 10-19, ..., 90-100.
# bins[0] to bins[8] has 10 items, but bins[9] has 11 items.
bins = [0]*10 # Use repetition operator (*) to create a list of 10 zeros
# Populate the histogram bins from the grades in the given lst.
for grade in lst:
if grade == 100: # Special case
bins[9] += 1
else:
bins[grade//10] += 1 # Use // for integer divide to get a truncated int
# Print histogram
# 2D pattern: rows are bins, columns are value of that particular bin in stars
for row in range(len(bins)): # [0, 1, 2, ..., len(bins)-1]
# Print row header
if row == 9: # Special case
print('{:3d}-{:<3d}: '.format(90, 100), end='') # Formatted output (new style), no newline
else:
print('{:3d}-{:<3d}: '.format(row*10, row*10+9), end='') # Formatted output, no newline
# Print one star per count
for col in range(bins[row]): print('*', end='') # no newline
print() # newline
# Alternatively, use str's repetition operator (*) to create the output string
#print('*'*bins[row])
def main():
"""The main function."""
# Create an initial empty list for grades to receive from input
grade_list = []
# Read grades with input validation
grade = int(input('Enter a grade between 0 and 100 (or -1 to end): '))
while grade != -1:
if 0 <= grade <= 100: # Python support this comparison syntax
grade_list.append(grade)
else:
print('invalid grade, try again...')
grade = int(input('Enter a grade between 0 and 100 (or -1 to end): '))
# Call functions and print results
print('---------------')
print('The list is:', grade_list)
print('The minimum is:', my_min(grade_list))
print('The minimum using built-in function is:', min(grade_list)) # Using built-in function min()
print('The sum is:', my_sum(grade_list))
print('The sum using built-in function is:', sum(grade_list)) # Using built-in function sum()
print('The average is: %.2f' % my_average(grade_list)) # Formatted output (old style)
print('The average is: {:.2f}'.format(my_average(grade_list))) # Formatted output (new style)
print('---------------')
print_histogram(grade_list)
# Ask the Python interpreter to run the main() function
if __name__ == '__main__':
main()
How It Works
Doc-String: The script begins with the so-called documentation string, or doc-string, to provide the documentation for this module. Doc-string is a multi-line string (delimited by triple single/double quotes), which can be extracted from the source file to create documentation. In this way, the documentation is maintained in the source code together with the code.
def my_sum(lst): We define a function called my_sum() which takes a list and return the sum of the items. It uses a for-each-item-in loop to iterate through all the items of the given list. As Python is interpretative, you need to define the function first, before using it. We choose the function name my_sum(list) to differentiate from the built-in function sum(list). sum is an built-in function in Python. Our local variable sum shadows the system built-in within the my_sum function.
bins = [0]10 : Python supports repetition operator (). This statement creates a list of ten zeros. Repetition operator (*) can be apply on string.
for row in range(len(bins)): Python supports only for-in loop. It does NOT support the traditional C-like for-loop with index. Hence, we need to use the built-in range(n) function to create a list of indexes [0, 1, …, n-1], then apply the for-in loop on the indexed list.
0 <= grade <= 100 : Python supports this syntax for comparison.
There are few ways of printing:
print() built-in function. By default, print() prints a newline at the end. You need to include argument end=’’ (empty string) to suppress the newline.
print(str.format()) : using Python’s new style for formatting string via str class member function str.format(). The string on which this method is called can contain texts, or replacement fields (place holders) delimited by braces {}. Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument, with C-like format specifiers beginning with : (instead of % in C) such as :4d for integer with width of 4, :6.2f for floating-point number with width of 6 and 2 decimal places, and :-5s for string with width of 5, and flags such as < for left-align, > for right-align, ^ for center-align.
print(‘formatting-string’ % args) : Python ‘s old style for formatted string using % operator. The formatting-string could contain C-like format-specifiers, such as %4d for integer, %6.2f for floating-point number, %5s for string. This line is included in case you need to read old programs.
grade = int(input(‘Enter … ‘)) : You can read input from standard input device (default to keyboard) via the built-in input() function. As the input() function returns a string, we need to cast it to int.
if name == ‘main’: When you execute a Python module via the Python Interpreter, the global variable name is set to ‘main’. On the other hand, when a module is imported into another module, its name is set to the module name. Hence, the above module will be executed if it is loaded by the Python interpreter, but not imported by another module. This is a good practice for running a module.
1.4. Scientific Programming with Python#
The importance of Python is great in many areas of scientific computing. Python is either the dominant player or a major player in
astronomy
chemistry
computational biology
engineering
meteorology
natural language processing
etc.
Use of Python is also rising in economics, finance, and operations research – which were previously dominated by MATLAB / Excel / STATA / C / Fortran.
The Python programming language allows for add-ons known as libraries or packages to provide extra features. Each library is a collection of modules, and each module is a collection of functions… or occasionally data. For example, the SciPy library contains a module called integrate
which contains a collection of functions for integrating equations or sampled data. For scientific applications, there is a series of core libraries collectively known as the SciPy stack along with many other popular libraries. The table below lists some of the common libraries for scientific applications with an asterisk by those often considered part of the SciPy stack.
Table 1 Common Python Scientific Libraries
Library |
Description |
---|---|
NumPy |
Provides arrays and a large collection of mathematical functions |
Matplotlib |
Popular and powerful plotting library |
SymPy |
Symbolic mathematics (somewhat analogous to Mathematica) |
Pandas |
Advanced data analysis tools |
SciPy |
Scientific data analysis tools for common scientific data analysis tasks including signal analysis, Fourier transform, integration, linear algebra, optimization, feature identification, and others |
Seaborn |
Advanced plotting library built on matplotlib |
Scikit-Image |
Scientific image processing and analysis |
Scikit-Learn |
Machine learning tools |
TensorFlow |
Machine learning tools for neural networks |
NMRglue |
Nuclear magnetic resonance data processing |
Biopython |
Computational biology and bioinformatics |
Scikit-Bio |
Computational biology and bioinformatics |
RDKit |
General purpose cheminformatics |
This section briefly showcases some examples of Python for general scientific programming.
1.4.1. NumPy#
One of the most important parts of scientific computing is working with data.
Data is often stored in matrices, vectors and arrays.
We can create a simple array of numbers with pure Python as follows:
a = [-3.14, 0, 3.14] # A Python list
a
[-3.14, 0, 3.14]
This array is very small so it’s fine to work with pure Python.
But when we want to work with larger arrays in real programs we need more efficiency and more tools.
For this we need to use libraries for working with arrays.
For Python, the most important matrix and array processing library is NumPy library.
For example, let’s build a NumPy array with 100 elements
import numpy as np # Load the library
a = np.linspace(-np.pi, np.pi, 100) # Create even grid from -π to π
a
array([-3.14159265, -3.07812614, -3.01465962, -2.9511931 , -2.88772658,
-2.82426006, -2.76079354, -2.69732703, -2.63386051, -2.57039399,
-2.50692747, -2.44346095, -2.37999443, -2.31652792, -2.2530614 ,
-2.18959488, -2.12612836, -2.06266184, -1.99919533, -1.93572881,
-1.87226229, -1.80879577, -1.74532925, -1.68186273, -1.61839622,
-1.5549297 , -1.49146318, -1.42799666, -1.36453014, -1.30106362,
-1.23759711, -1.17413059, -1.11066407, -1.04719755, -0.98373103,
-0.92026451, -0.856798 , -0.79333148, -0.72986496, -0.66639844,
-0.60293192, -0.53946541, -0.47599889, -0.41253237, -0.34906585,
-0.28559933, -0.22213281, -0.1586663 , -0.09519978, -0.03173326,
0.03173326, 0.09519978, 0.1586663 , 0.22213281, 0.28559933,
0.34906585, 0.41253237, 0.47599889, 0.53946541, 0.60293192,
0.66639844, 0.72986496, 0.79333148, 0.856798 , 0.92026451,
0.98373103, 1.04719755, 1.11066407, 1.17413059, 1.23759711,
1.30106362, 1.36453014, 1.42799666, 1.49146318, 1.5549297 ,
1.61839622, 1.68186273, 1.74532925, 1.80879577, 1.87226229,
1.93572881, 1.99919533, 2.06266184, 2.12612836, 2.18959488,
2.2530614 , 2.31652792, 2.37999443, 2.44346095, 2.50692747,
2.57039399, 2.63386051, 2.69732703, 2.76079354, 2.82426006,
2.88772658, 2.9511931 , 3.01465962, 3.07812614, 3.14159265])
Now let’s transform this array by applying functions to it.
b = np.cos(a) # Apply cosine to each element of a
c = np.sin(a) # Apply sin to each element of a
Now we can easily take the inner product of b
and c
.
b @ c
np.float64(4.04891256782214e-16)
We can also do many other tasks, like
compute the mean and variance of arrays
build matrices and solve linear systems
generate random arrays for simulation, etc.
We will discuss the details later in the lecture series, where we cover NumPy in depth.
1.4.2. NumPy Alternatives#
While NumPy is still the king of array processing in Python, there are now important competitors.
Libraries such as JAX, Pytorch, and CuPy also have built in array types and array operations that can be very fast and efficient.
In fact these libraries are better at exploiting parallelization and fast hardware, as we’ll explain later in this series.
However, you should still learn NumPy first because
NumPy is simpler and provides a strong foundation, and
libraries like JAX directly extend NumPy functionality and hence are easier to learn when you already know NumPy.
1.4.3. Graphics#
A major strength of Python is data visualization.
The most popular and comprehensive Python library for creating figures and graphs is Matplotlib, with functionality including
plots, histograms, contour images, 3D graphs, bar charts etc.
output in many formats (PDF, PNG, EPS, etc.)
LaTeX integration
Example 2D plot with embedded LaTeX annotations
Example contour plot
Example 3D plot
More examples can be found in the Matplotlib thumbnail gallery.
Other graphics libraries include
You can visit the Python Graph Gallery for more example plots drawn using a variety of libraries.
1.4.4. SciPy#
The SciPy library is built on top of NumPy and provides additional functionality.
For example, let’s calculate \( \int_{-2}^2 \phi(z) dz \) where \( \phi \) is the standard normal density.
from scipy.stats import norm
from scipy.integrate import quad
ϕ = norm()
value, error = quad(ϕ.pdf, -2, 2) # Integrate using Gaussian quadrature
value
0.9544997361036417
SciPy includes many of the standard routines used in
See them all here.
Later we’ll discuss SciPy in more detail.
1.4.5. Symbolic Algebra#
It’s useful to be able to manipulate symbolic expressions, as in Mathematica or Maple.
The SymPy library provides this functionality from within the Python shell.
from sympy import Symbol
x, y = Symbol('x'), Symbol('y') # Treat 'x' and 'y' as algebraic symbols
x + x + x + y
We can manipulate expressions
expression = (x + y)**2
expression.expand()
solve polynomials
from sympy import solve
solve(x**2 + x + 2)
[-1/2 - sqrt(7)*I/2, -1/2 + sqrt(7)*I/2]
and calculate limits, derivatives and integrals
from sympy import limit, sin, diff
limit(1 / x, x, 0)
limit(sin(x) / x, x, 0)
diff(sin(x), x)
The beauty of importing this functionality into Python is that we are working within a fully fledged programming language.
We can easily create tables of derivatives, generate LaTeX output, add that output to figures and so on.
1.4.6. Statistics#
Python’s data manipulation and statistics libraries have improved rapidly over the last few years.
1.4.6.1. Pandas#
One of the most popular libraries for working with data is pandas.
Pandas is fast, efficient, flexible and well designed.
Here’s a simple example, using some dummy data generated with Numpy’s
excellent random
functionality.
import pandas as pd
np.random.seed(1234)
data = np.random.randn(5, 2) # 5x2 matrix of N(0, 1) random draws
dates = pd.date_range('28/12/2010', periods=5)
df = pd.DataFrame(data, columns=('price', 'weight'), index=dates)
print(df)
df.mean()
price weight
2010-12-28 0.471435 -1.190976
2010-12-29 1.432707 -0.312652
2010-12-30 -0.720589 0.887163
2010-12-31 0.859588 -0.636524
2011-01-01 0.015696 -2.242685
price 0.411768
weight -0.699135
dtype: float64
1.4.7. Networks and Graphs#
The study of networks and graphs becoming an important part of scientific work in engineering, economics, finance and other fields.
For example, we are interesting in studying
production networks
networks of banks and financial institutions
friendship and social networks
etc.
Python has many libraries for studying networks and graphs. One well-known example is NetworkX.
Its features include, among many other things:
standard graph algorithms for analyzing networks
plotting routines
Here’s some example code that generates and plots a random graph, with node color determined by the shortest path length from a central node.
import networkx as nx
import matplotlib.pyplot as plt
np.random.seed(1234)
# Generate a random graph
p = dict((i, (np.random.uniform(0, 1), np.random.uniform(0, 1)))
for i in range(200))
g = nx.random_geometric_graph(200, 0.12, pos=p)
pos = nx.get_node_attributes(g, 'pos')
# Find node nearest the center point (0.5, 0.5)
dists = [(x - 0.5)**2 + (y - 0.5)**2 for x, y in list(pos.values())]
ncenter = np.argmin(dists)
# Plot graph, coloring by path length from central node
p = nx.single_source_shortest_path_length(g, ncenter)
plt.figure()
nx.draw_networkx_edges(g, pos, alpha=0.4)
nx.draw_networkx_nodes(g,
pos,
nodelist=list(p.keys()),
node_size=120, alpha=0.5,
node_color=list(p.values()),
cmap=plt.cm.jet_r)
plt.show()
1.4.8. Other Scientific Libraries#
As mentioned above, there are literally thousands of scientific libraries for Python.
Some are small and do very specific tasks.
Others are huge in terms of lines of code and investment from coders and tech firms.
Here’s a short list of some important scientific libraries for Python not mentioned above.
SymPy for symbolic algebra, including limits, derivatives and integrals
statsmodels for statistical routines
scikit-learn for machine learning
Keras for machine learning
GeoPandas for spatial data analysis
Dask for parallelization
Numba for making Python run at the same speed as native machine code
CVXPY for convex optimization
scikit-image and OpenCV for processing and analysing image data
BeautifulSoup for extracting data from HTML and XML files
In this lecture series we will learn how to use some of these libraries for scientific computing.
1.4.9. Cloud Computing#
Running your Python code on massive servers in the cloud is becoming easier and easier.
A nice example is Anaconda Enterprise.
See also
The Google App Engine (Python, Java, PHP or Go)
1.4.10. Parallel Processing#
Apart from the cloud computing options listed above, you might like to consider
The Starcluster interface to Amazon’s EC2.
GPU programming through PyCuda, PyOpenCL, Theano or similar.
1.4.11. Other Developments#
There are many other interesting developments with scientific programming in Python.
Some representative examples include
1.5. Learn More#
Read more about Python’s history and rise in popularity.
Visit the Python Package Index.
Read the book
Python Programming And Numerical Methods: A Guide For Engineers And Scientists.Keep up to date on what’s happening in the Python community with the Python subreddit.