Example Notebook#

Mahmood Amintoosi

Computer Science Dept, Ferdowsi University of Mashhad

You can also create content with Jupyter Notebooks. This means that you can include code blocks and their outputs in your book. In this notebook, we show some examples of loading and plotting data. Check this documentation about how to write executable content.

# Import packages
import pandas as pd
import plotly.express as px
import seaborn as sns

Load data#

You can put your data at the same directory as the notebook file and then use pandas to load the data.

# Load the Iris flower dataset using pandas
df = pd.read_csv("iris_data.csv")
df
sepal_length sepal_width petal_length petal_width species species_id
0 5.1 3.5 1.4 0.2 setosa 1
1 4.9 3.0 1.4 0.2 setosa 1
2 4.7 3.2 1.3 0.2 setosa 1
3 4.6 3.1 1.5 0.2 setosa 1
4 5.0 3.6 1.4 0.2 setosa 1
... ... ... ... ... ... ...
145 6.7 3.0 5.2 2.3 virginica 3
146 6.3 2.5 5.0 1.9 virginica 3
147 6.5 3.0 5.2 2.0 virginica 3
148 6.2 3.4 5.4 2.3 virginica 3
149 5.9 3.0 5.1 1.8 virginica 3

150 rows × 6 columns

Plot data#

We set the repository in a way that you can use Plotly for interactive visualizations. For more information, check this documentation.

# Plot the Iris dataset using Plotly
g1 = px.scatter_3d(df,
                   x="sepal_width",
                   y="sepal_length",
                   z="petal_width",
                   color="species",
                   size="petal_length",
                   opacity=0.6,
                   size_max=30,
                   height=700)
g1

You can also plot the data using static visualizations, such as the seaborn library.

# Plot the Iris dataset using seaborn
g2 = sns.pairplot(df.drop("species_id", axis=1),
                  hue='species')
g2
<seaborn.axisgrid.PairGrid at 0x205f622c590>
../_images/59a11e5ffd394ecb4fdd72973983452a68952606e76520abd8202e148b01d815.png

Math Formulas with \(\LaTeX\)#

  • Univariate Normal Density:

    \[ p(x) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left[-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right] \]

For which the expected value of \(x\) (an average, here taken over the feature space) is:

\[ \mu = \mathbb{E}[x] = \int_{-\infty}^{\infty} x \, p(x) \, dx \]

and where the expected squared deviation or variance is:

\[ \sigma^2 = \mathbb{E}[(x - \mu)^2] = \int_{-\infty}^{\infty} (x - \mu)^2 \, p(x) \, dx \]

The univariate normal density is completely specified by two parameters: its mean \(\mu\) and variance \(\sigma^2\). For simplicity, we often abbreviate \(p(x)\) by writing \(p(x) \sim N(\mu, \sigma^2)\) to say that \(x\) is distributed normally with mean \(\mu\) and variance \(\sigma^2\). Samples from normal distributions tend to cluster about the mean, with a spread related to the standard deviation \(\sigma\) (See Chapter 2 of Pattern Classification [DHS00]).