Main TextBooks:
- Data Mining and Analysis: Fundamental Concepts and Algorithms by Mohammed J. Zaki and Wagner Meira Jr., 2021 (Data Mining & Analysis) PDF
- Pattern Recognition, by Sergios Theodoridis and Konstantinos Koutroumpas, 2009
- The Elements of Statistical Learning (ESL)
- Foundations of Data Science (FDS), by Avrim Blum, John Hopcroft, and Ravindran Kannan, 2018
1402/07/08
Introduction to Data Science
- Slide: Introduction to Data Science by Zico Kolter
- Slide: Introduction to Data Science by Kevin Markham
- Slide: Clustering by Matt Dickenson
HW1Generate random points with uniform distribution in the unit sphere, due date: 1402/07/21 (Extended)
Python Programming
HW2Satisfiability Table, due date: 1402/07/21 (Extended)
1402/07/10
- Discrete Optimization
- Draft version of My Book: Meta Heuristic Algorithms
Some Examples:
- N-Queen Problem
- Knight-Tour, My old Delphi program, previous century!
- Traveling Salesman Problem
- Packing & Cutting Problems, My old Delphi program, previous century! My MSc. Project.
Further Reading
1402/07/15
- Random Search
- Chapter 1 & 2 of My Book + My NP-Complete Paper
HW3Python Code of Program 1.2, Page 11 of My book - RS, due: 1402/07/25
1402/07/17
HW1-Sol, Solution: ColabUnit Sphere
- SAT
-
- SA, Continue
- Chapter 3 of My Book
HW4Python Code of Program 1.3, Page 23 of My book - SA, due: 1402/07/26
Further Reading
1402/07/22
- PSO
- Chapter 6 of My Book
HW5Use one the Python packages to find the minimum of \(f(x)=3sin(x)+(0.1x-3)^2\): PSO for function 1.2, due: 1402/07/28
Some Python packages for PSO:
Further Reading
Paper: A Fish School Clustering Algorithm: Applied to Student Sectioning Problem
1402/07/24
- Chapter 11 & 12 of Pattern Recognition, Theodoridis
- Chapeter 11:
- Page 602, Section 11.2 PROXIMITY MEASURES - Page 604
- Page 606, Section B. Similarity Measures: The inner product & Pearson’s correlation coefficient
- Page 607, Discrete-Valued Vectors & contingency table
- Page 616, 11.2.3 Proximity Functions between a Point and a Set
- Chapter 12:
- 12.1 INTRODUCTION
- 12.3 SEQUENTIAL CLUSTERING ALGORITHMS
1402/07/29
- HW2-Sol, Solution:
- ColabFatehinia
- ColabBagherpour
- Stirling Numbers, Recursive Functions & SAT Table
- ColabSAT Table
1402/08/01
- SAT Table & Brute Force Algorithm for Clustering
HW6Generate data & Clustering, due: 1402/08/04
HW7BSAS Algorithm, due: 1402/08/07
Image Processing and Computer Vision, Intro
1402/08/06
Representative-Based Clustering
- Section 14.3.5 of ESL
- Page 527/764 ESL, Eq. 14.28: W(C)
- The problem with one unknown variable becomes a problem with two unknowns!
- Section 8.3 of K-means Clustering
- ColabImage Segmentation 02- kmeans clustering
- Chapter 13 of Data Mining & Analysis
- Slides (Representative-based Clustering)
Further Reading
- Lloyd’s, MacQueen’s and Hartigan-Wong’s k-Means
- Convergence in Hartigan-Wong k-means method and other algorithms
1402/08/13
- sklearn.datasets.make_blobs
- Section 14.3.6 of ESL
- Page 528/764 ESL, K-means
1402/08/15
- Section 14.3.9 of ESL
- Page 533/764 Vector Quantization
ColabImage Segmentation 03- kmeans clustering
ColabLVQ
HW8K-means on color images, due: 1402/08/19
Further Reading
1402/08/20,22
High Dimensional Data
- Section 2.5 of ESL
- Page 41/764 Local Methods in High Dimensions
-
Slides Chap. 1 of Zaki
- Slides Chap. 6 of Zaki
ColabHigh Dimensional Data - The curse of dimensionality
HW9Page 15 of FDS - Orthogonality of d-dimensional Gaussian vectors, due: 1402/08/26
ColabHigh Dimensional Data - KNN
ColabClustering of images, as high dim. data
Further Reading
- Random Projection: Theory and Implementation in Python with Scikit-Learn
- Johnson–Lindenstrauss lemma
- Gaussian random projection
- Scikit-learn: The Johnson-Lindenstrauss bound for embedding with random projections
PaperSupervised dimensionality reduction for big data
PaperAn Introduction to Johnson–Lindenstrauss Transforms
1402/09/04
Bias-Variance Tradeoff
Further Reading
- MLU-Explain double-descent, part 2
- The Bias-Variance Tradeoff: A Newbie’s Guide, by a Newbie
- bias-variance-trade-off
PaperVC Theoretical Explanation of Double Descent
PaperReconciling modern machine-learning practice and the classical bias–variance trade-off
PaperUnderstanding the double descent curve in Machine Learning
1402/09/06,11,13
- Chapter 5 of VanderPlas: In Depth: k-Means Clustering
- Chapter 17 of Data Mining & Analysis
- Clustering Validation
- Matching in Bipartite Graphs
- Silhouette, Clustering Evaluation
- Clustering Evaluations
Colabbi-partiate-graph-maximum-matching
ColabSilhouette
Further Reading
- MSc. Project: Graph Cut
- Wiki: Graph Matching
- Bipartite Graphs and Stable Matchings
- MIT, NOTES ON MATCHING
PaperGraph Matching and local search
PaperGraph Feature Selection for Anti-Cancer Plant Recommendation
1402/09/18, 25, 27
- Principal Component Analysis explained visually
- In Depth: Principal Component Analysis, Python Data Science Handbook
- PRML-PCA Slides
- Matrix Differentiation by Randal J. Barnes
- Chapter 7 of Zaki (Slides)
ColabPCA-01
KaggleCountry Profiling Using PCA and Clustering
- An Introduction to Principal Component Analysis (PCA) with 2018 World Soccer Players Data, PDF
- Using PCA to See Which Countries have Better Players for World Cup Games, PDF
HW10PCA Algorithm, due: 1402/10/02
Further Reading
- A geometric interpretation of the covariance matrix
- A geometric interpretation of … (In Persian)
- PCA in SKLearn
- PCA on IRIS
- Faces recognition example using eigenfaces and SVMs
PaperEigenbackground Revisited
ColabSVD-01
ColabSVD for Image Compression
1402/10/02
Hierarchical Clustering
- A good image for hierarchical clustering
- Chapter 14 of Data Mining & Analysis
- Slides (Hierarchical Clustering): PDF
- sklearn.cluster.AgglomerativeClustering
ColabClustering of images
Further Reading
- Slide: Hierarchical Clustering by Jing Gao
1402/10/04
Linear Discriminant Analysis
- Chapter 20 of Data Mining & Analysis
- Slides (Linear Discriminant Analysis): PDF
- Comparison of LDA and PCA
- HW: Compare LDA and PCA first axis (classification by SVM)
1402/10/09
Mid Term
1402/10/11
Colab Gaussian Mixture Models
1402/10/16,18
Bayes
Further Reading
- Duda
- Naïve Bayes Algorithm -Implementation from scratch in Python, Medium
- Segmentation using Bayesian Decision Theory
PaperBayeSeg: Bayesian modeling for medical image segmentation with interpretable generalizability
1402 Winter
Adding Features
ColabAdd Pixels’ coordinates for image segmentation
1402/11/01
EXAM
Image Processing and Computer Vision
- Website: Image Processing in Python with Scikit-image by M. Jaderian
- Website: Image Processing in Python with OpenCV by M. Kiani
- Github: Tutorial for Image Processing in Python by Shaoning Zeng
- Book: Image processing tutorials
- Further Reading
- Some published papers
Image Matting
- Github Rep.: A Python library for alpha matting https://pymatting.github.io/ by Y. Gavet & J. Debayle
K-means
- Sec 5.11 of JakeVanderPlas
- Bilateral K-Means for Superpixel Computation
- Balanced clustering - Wikipedia
- balanced-kmeans · PyPI
- K-means using PyTorch (github.com)
- Balanced K-Means for Clustering
- Balanced k-Means Revisited
- K-Means Clustering in Python: A Practical Guide – Real Python
- Data clustering: 50 years beyond K-means - ScienceDirect
- K-Means Factorization
- Clustering IRIS dataset with particle swarm optimization(PSO)
Representative-Based Clustering
- Chapter 13 of Data Mining & Analysis
- HW 13.5: Q2, Q4, Q6, Q7
- Slides (Representative-based Clustering): PDF
- Slide: Introduction to Machine Learning (Clustering and EM) by Barnabás Póczos & Aarti Singh
- Tutorial: The Expectation Maximization Algorithm by Sean Borman
- Tutorial: What is Bayesian Statistics? by John W Stevens
Further Reading
- Slide: Tutorial on Estimation and Multivariate Gaussians by Shubhendu Trivedi
- Slide: Mixture Model by Jing Gao
- Paper: Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D
- Paper: k-Means Requires Exponentially Many Iterations Even in the Plane by Andrea Vattani
- Book: Understanding Machine Learning: From Theory to Algorithms by Shai Shalev-Shwartz and Shai Ben-David
Mahalanobis distance
- Chapter 2 of Zaki, page 54, eq. 2.43
- What is Mahalanobis distance?
- Mahalanobis Distance – Understanding the math with examples (python)
- Unlocking the Power of Mahalanobis Distance: Exploring Multivariate Data Analysis with Python
- Outlier detection-faradars
- Mahalanobis Distance – Understanding the math with examples (python)
1402 Winter
Clustering Validation
- Chapter 17 of Data Mining & Analysis
- Slides of Section 17.1 (Clustering Validation): PDF
- Slide: Clustering Analysis by Enza Messina
- Slide: Information Theory by Jossy Sayir
- Slide: Normalized Mutual Information: Estimating Clustering Quality by Bilal Ahmed
Further Reading
- Slide: Clustering Evaluation (II) by Andrew Rosenberg
- Slide: Evaluation (I) by Andrew Rosenberg
1402 Winter
Density-Based Clustering
- Chapter 15 of Data Mining & Analysis
- Slides of Section 15.1 (Density-based Clustering): PDF
- Slide: Spatial Database Systems by Ralf Hartmut Güting
1402 Winter
Kernel Method
- Chapter 5 of Data Mining & Analysis
- Kernel-Kmeans Chapter 13 of Data Mining & Analysis
- HW TBA
EXAM
1402 Winter
Spectral and Graph Clustering
- Chapter 16 of Data Mining & Analysis
Exercises 16.5: Q2, Q3, Q6 - Slides (Spectral and Graph Clustering): PDF
- Slide: Spectral Clustering by Andrew Rosenberg
- Slide: Introduction to Spectral Clustering by Vasileios Zografos and Klas Nordberg
Further Reading
- Slide: Spectral Methods by Jing Gao
- Tutorial: A Tutorial on Spectral Clustering by Ulrike von Luxburg
- Tutorial: Matrix Differentiation by Randal J. Barnes
- Lecture: Spectral Methods by Sanjoy Dasgupta
- Paper: Positive Semidefinite Matrices and Variational Characterizations of Eigenvalues by Wing-Kin Ma
Itemset Mining
- Chapter 8 of Data Mining & Analysis
1402 Winter
Link Analysis
- Ranking Graph Vertices, Page Rank
- Linear Algebra and Technology
Further Reading
- Chapter 5 of Mining of Massive Datasets
- Slide of Sections 5.1, 5.2 (PageRank, Efficient Computation of PageRank): Analysis of Large Graphs 1
- Slide of Sections 5.3-5.5 (Topic-Sensitive PageRank, Link Spam, Hubs and Authorities): Analysis of Large Graphs 2
- Slide: The Linear Algebra Aspects of PageRank by Ilse Ipsen
- Paper: A Survey on Proximity Measures for Social Networks by Sara Cohen, Benny Kimelfeld, Georgia Koutrika
Additional Slides:
- Practical Data Science by Zico Kolter
- Course: Data Mining by U Kang
-
Statistical Data Mining Tutorials by Andrew W. Moore
- Lecture: Finding Meaningful Clusters in Data by Sanjoy Dasgupta
- Paper: An Impossibility Theorem for Clustering by Jon Kleinberg