Welcome to Data Mining Course#
Mahmood Amintoosi, Spring 2025
Computer Science Dept, Ferdowsi University of Mashhad
Note
These lectures were built using the new Sphinx-based Jupyter Book 2.0 tool set, as part of the ExecutableBookProject. They are intended mainly as a demonstration of these tools. Instructions for how to build them from source can be found in the Jupyter Book documentation.
Table of Contents#
Introduction to NumPy
Data Manipulation with Pandas
Visualization with Matplotlib
Machine Learning
Course Resources for Data Mining#
Python Data Science Handbook by Jake VanderPlas: This website contains the full text of the Python Data Science Handbook by Jake VanderPlas; the content is available on GitHub in the form of Jupyter notebooks, which is the main source of this course materials.
Pattern Classification by Richard O. Duda, Peter E. Hart, David G. Stork
Pattern Recognition and Machine Learning by Christopher M. Bishop
Further Machine Learning Resources#
The Scikit-Learn website: The Scikit-Learn website has an impressive breadth of documentation and examples covering some of the models discussed here, and much, much more. If you want a brief survey of the most important and often-used machine learning algorithms, this is a good place to start.
Introduction to Machine Learning with Python, by Andreas C. Müller and Sarah Guido (O’Reilly). This book covers many of the machine learning fundamentals discussed in these chapters, but is particularly relevant for its coverage of more advanced features of Scikit-Learn, including additional estimators, model validation approaches, and pipelining.
Machine Learning with PyTorch and Scikit-Learn, by Sebastian Raschka (Packt). Sebastian Raschka’s most recent book starts with some of the fundamental topics covered in these chapters, but goes deeper and shows how those concepts apply to more sophisticated and computationally intensive deep learning and reinforcement learning models using the well-known PyTorch library.
Course Staff#
Instructors#
![]() |
Mahmood Amintoosi Email: m.amintoosi AT um.ac.ir |
Teaching Assistants#
![]() |
Adel Ahmadi (TA Head) Email: |
I should mention that the original material was from Jake VanderPlas’s Python Data Science Handbook. I used his notebooks and modified it to suit my own needs and preferences. I would like to thank him for his great work and generosity.
References#
M. Amintoosi, M. Fathy, N. Mozayani, and A.T. Rahmani. A fish school clustering algorithm: applied to student sectioning problem. Dynamics of Continuous Discrete & Impulse Systems, series B: Applications and Algorithms, 2:696–699, December 2007. Post Proceeding of LSMS2007, Life System Modeling and Simulation 2007, China.
M. Amintoosi and J. Haddadnnia. Feature selection in a fuzzy student sectioning algorithm. Lecture Notes in Computer Science, 3616:147–160, 2005. Indexed by DBLP.
M. Amintoosi, H. SadoghiYazdi, M.Fathy, and R. Monsefi. Using pattern matching for tiling and packing problems. European Journal of Operational Research, 183:950–960, 2007. Indexed by DBLP and SCOPUS.
M. Amintoosi, H. Sadoghi Yazdi, and J. Haddadnnia. Fuzzy student sectioning. In PATAT04: Practice and Theory of Automated Timetabling, 421–424. USA, Aug 2004.
Mahmood Amintoosi. Retina vessel segmentation using knn matting. In 3rd International Conference on Pattern Recognition and Image Analysis of Iran. Shahrekord University, 2017. دقیقتر کردن استخراج رگهای خونی شبکیه چشم با روش درهمتنیدگی تصویر مبتنی بر نزدیکترین همسایگی. URL: https://www.dropbox.com/s/klkoagz3m3dc98t/1396-IPRIA2017-Matting.pdf?dl=0.
Mahmood Amintoosi. The application of taylor expansion in reducing the size of convolutional neural networks for classifying impressionism and miniature style paintings. Mathematics and Society, 5(1):1–16, 2020. کاربرد بسط تیلور در کاهش حجم شبکه های عصبی پیچشی برای طبقه بندی نقاشی های سبک امپرسیونیسم و مینیاتور.
Mahmood Amintoosi. Combining regularization and optimal brain damage methods for reducing a deep learning model size. Machine Vision and Image Processing, 9(1):31–45, 2021. ترکیب روش منظمسازی تُنُک و آسیب مغزی بهینه در کوچکسازی یک مدل یادگیری عمیق.
Mahmood Amintoosi. Overlapping clusters in cluster graph convolutional networks. Journal of Algorithms and Computation, 53(2):33–45, 2021.
Mahmood Amintoosi. Fully connected to fully convolutional: road to yesterday. Soft Computing and Information Technology, 11(1):60–72, 2022. تمام متصل به تمام پیچشی: پلی به گذشته.
Mahmood Amintoosi. Style transfer for data augmentation in convolutional neural networks applied to fire detection. Computational Intelligence in Electrical Engineering, 13(4):97–114, 2022. انتقال سبک برای افزایش دادههای آموزشی شبکههای کانولوشنی در شناسایی شعلۀ آتش. doi:10.22108/isee.2021.124044.1490.
Mahmood Amintoosi. Improving housing price prediction with spatial information representation based on random walk. In 3rd Specialized Seminar on Data Science and Its Applications, 20. Ferdowsi University of Mashhad, 2024. بهبود پیشبینی قیمت مسکن با بازنمایی اطلاعات مکانی مبتنی بر قدمزنی تصادفی.
Mahmood Amintoosi. Traffic prediction using graph convolutional networks based on learning. In 55th Annual Iranian Mathematics Conference, 145–148. Ferdowsi University of Mashhad, 2024. پیشبینی ترافیک با شبکههای پیچشی گراف مبتنی بر یادگیری.
Mahmood Amintoosi. Deepwalk for student sectioning. Data & Knowledge Engineering, xx(yy):–, 202x.
Mahmood Amintoosi and Parvin Bakhti. Data classification using image matting. In 2nd Seminar on Control and Optimization. Shahroud University of Technology, 2018. طبقهبندی دادهها با درهمتنیدگی تصویر.
Mahmood Amintoosi and Farzam Farbiz. Eigenbackground revisited: can we model the background with eigenvectors? Journal of Mathematical Imaging and Vision, 64(5):463–477, 2022.
Mahmood Amintoosi and Tayyebe Fayyaz. Genetic algorithms for spectral clustering parameter estimation in mri images. In 8th International Conference of the Iranian Operations Research Society. Ferdowsi University of Mashhad, 2015. محاسبه پارامترهای خوشهبندی طیفی در تصاویر MRI با الگوریتم ژنتیک.
Mahmood Amintoosi, Sakineh Khorsandi, and Mehdi Zaferanieh. Elm evaluation for image segmentation. In 3rd International Conference on Pattern Recognition and Image Analysis of Iran. Shahrekord University, 2017. ارزیابی عملکرد ماشین یادگیر نهایی در قطعهبندی تصاویر.
Mahmood Amintoosi and Eisa Kohan-Baghkheirati. Graph feature selection for anti-cancer plant recommendation. Control and Optimization in Applied Mathematics, 8(2):1–15, 2023.
Mahmood Amintoosi and Farzaneh Rashidabadi. Enhancement of heart coronary vessel segmentation using semi-supervised learning. In 8th International Conference of the Iranian Operations Research Society. Ferdowsi University of Mashhad, 2015. آشکارسازی بهتر شریانهای کرونری قلب با یادگیری نیمهنظارتیخودکار.
Zahra Asaadi. Parallel version of the conjugategradient method in distributed andshared environments. Master's thesis, Hakim Sabzevari University, Faculty of Mathematics and Computer Science, October 2021. نسخه موازی روش گرادیان مزدوج در محیط های توزیعی و اشتراکی. URL: http://hcloud.hsu.ac.ir/index.php/s/B4oT53TfOknyNEg.
Alale Asaran. Super resolution via sparserepresentation. Master's thesis, Hakim Sabzevari University, Faculty of Mathematics and Computer Science, Winter 2016. فراتفکیک پذیری با نمایش تنک. URL: http://hcloud.hsu.ac.ir/index.php/s/W9ImIzeV6C1mqZo.
Amir Jologir Baghestan, Mahmood Amintoosi, and Mohammad Arashi. Graph neural networks for predicting molecular properties. In 3rd Specialized Seminar on Data Science and Its Applications, 37. Ferdowsi University of Mashhad, 2024. شبکههای عصبی گراف در پیشگویی خواص مولکولی.
Mahboube Bakhshali. The subspace pursuit method in sparseoptimization. Master's thesis, Hakim Sabzevari University, Faculty of Mathematics and Computer Science, September 2018. روش جستجوی زیرفضا در بهینه سازی تنک. URL: http://hcloud.hsu.ac.ir/index.php/s/jacmnZiPfNFpYfk.
Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O'Reilly Media, Inc., 1st edition, 2009. ISBN 0596516495. URL: https://tjzhifei.github.io/resources/NLTK.pdf.
Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification (2nd Edition). Wiley-Interscience, 2 edition, November 2000. ISBN 0471056693. URL: https://file.fouladi.ir/courses/pr/books/%5BDuda%5D_PatternClassification.pdf.
R. Farhadi, M. Hadavifar, M. Moeinaddini, and M. Amintoosi. Prediction of co and pm10 in cold and warm seasons and survey of the effect of instability indices on contaminants using artificial neural network: a case study in tehran city. Iranian (Iranica) Journal of Energy & Environment, 13(1):71–78, 2022. doi:10.5829/ijee.2022.13.01.08.
Razieh Farhadi, Mojtaba Hadavifar, Mazaher Moeinaddini, and Mahmood Amintoosi. Prediction of the air quality by artificial neural network using instability indices in the city of tehran-iran. AUT Journal of Civil Engineering, 4(4):–, 2020. doi:10.22060/ajce.2019.17018.5609.
Mohammadreza Fatehinia, Mahmood Amintoosi, and Seyed Masih Sajadi. Click rate prediction in online advertising industry with real data and its challenges. In 3rd Specialized Seminar on Data Science and Its Applications, 26. Ferdowsi University of Mashhad, 2024. پیشبینی نرخ کلیک در صنعت تبلیغات آنلاین با دادههای واقعی و چالشهای آن.
Masood Ghayoomi and Maryam Mousavian. Application of the neural network-based machine learning method to classify scientific articles. Iranian Journal of Information Processing and Management, 37(4):1244–1217, 2022. URL: https://jipm.irandoc.ac.ir/article_699699.html, arXiv:https://jipm.irandoc.ac.ir/article_699699_499edcec59b5f54fe0ce422abd7e629b.pdf, doi:10.35050/JIPM010.2022.008.
Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning: data mining, inference and prediction. Springer, 2 edition, 2009. Edition 2, print 12. URL: https://hastie.su.domains/ElemStatLearn/.
Fatemeh Sadat Hosseini and Mahmood Amintoosi. Graph minimum cut using simulated annealing. In 7th International Conference of the Iranian Operations Research Society. Semnan, 2014. برش کمینهی گراف با شبیهسازی تبریدی.
Fatemeh Sadat Hosseini and Mahmood Amintoosi. Graph minimum cut using tabu search. In 7th International Conference of the Iranian Operations Research Society. Semnan, 2014. برش کمینهی گراف باجستجوی ممنوعه.
Fatemeh Sadat Hosseini and Mahmood Amintoosi. Inefficiency of the karger's algorithm in min-cut of weighted graphs. In 3rd Seminar on Control and Optimization, 21–24. Hakim Sabzevari University, 2019. بررسی نا کارآمدی الگوریتم کارگر در برش کمینه گرافهای وزن دار.
Hoda Mehrabagherpour, Mahmood Amintoosi, and Mohammad Arashi. Urban traffic prediction using graph convolutional networks. In 3rd Specialized Seminar on Data Science and Its Applications, 25. Ferdowsi University of Mashhad, 2024. پیشبینی ترافیک شهری با بهرهگیری از شبکههای پیچشی گراف.
Mehdi Moghimi and Mahmood Amintoosi. Mri image segmentation using pulse-coupled neural networks. In 5th National Conference on Electrical and Electronics Engineering. Gonabad, 2013. تشخیص ناحیه چربی در تصاویر MRI با استفاده از شبكه عصبی با كوپلاژ پالسی.
Mina Mortazavi, Morteza Gachpazan, Mahmood Amintoosi, and Soheil Salashour. Fractional derivative approach to sparse super resolution. The Visual Computer, 39(7):3011–3028, Jul 2023.
Mehdi Nemati and Mahmood Amintoosi. Enhancing text extraction from scanned medical documents using large language models. In Third Seminar on Data Science and its Applications, 58. Ferdowsi University of Mashhad, 2024.
Mehdi Nemati, Mahmood Amintoosi, and Mehdi Zaferanieh. Conjugate gradient initialization using genetic algorithm in spectral clustering. In 6th Seminar on Harmonic Analysis and Applications. Hakim Sabzevari University, 2018. مقدار دهی اولیه گرادیان مزدوج در خوشه بندی طیفی با الگوریتم ژنتیک.
Farzane Rashidabadi. Image matting. Master's thesis, Hakim Sabzevari University, Faculty of Mathematics and Computer Science, January 2016. برش هوشمند تصویر. URL: http://hcloud.hsu.ac.ir/index.php/s/OaHkkTSsO8mNrk0.
Fateme sadat Hoseini. Graph minimum cut. Master's thesis, Hakim Sabzevari University, Faculty of Mathematics and Computer Science, October 2021. برش کمینه در گراف. URL: http://hcloud.hsu.ac.ir/index.php/s/OtehB4LUQFv8cla.
Mehdi Sheida, Hessam Ekhtiyar, and Mahmood Amintoosi. A unified algorithm for segmentation of various medical images. In 2nd National Conference on Soft Computing and Information Technology. Mahshahr, 2012. الگوریتمی واحد برای ناحیه بندی انواع تصاویر پزشکی.
Sergios Theodoridis and Konstantinos Koutroumbas. Pattern Recognition, Fourth Edition. Academic Press, Inc., USA, 4th edition, 2008. ISBN 1597492728. URL: https://uupload.ir/view/bmf1_pattern_recognition-sergio_theodoridis-4th_edition.pdf.
H. Sadoghi Yazdi, M. Amintoosi, and M. Fathy. Facial expression recognition in video using QIM and ITMI. In 4th Conference on Machine Vision and Image Processing. Mashhad, Iran, February 2006. Ferdowsi University of Mashhad. شناسایی حالت چهره با استفاده از پایگاه دادهٔ مكانی- زمانی QIM و ITMI.
Farzad Zandi. Designing effective inter-pixel information flow for natural image matting. Master's thesis, Hakim Sabzevari University, Faculty of Mathematics and Computer Science, July 2018. درهمتنیدگی تصویر مبتنی بر جریان اطلاعات بین پیکسلی. URL: http://hcloud.hsu.ac.ir/index.php/s/7n0SdzOAGHAKH31.
Alice Zheng and Amanda Casari. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. O'Reilly Media, Inc., 1st edition, 2018. ISBN 1491953241. URL: https://www.repath.in/gallery/feature_engineering_for_machine_learning.pdf.
Ghasem Zolfaghari, Sara Nezamparvar, and Mahmood Amintoosi. Modeling dust particles from stack with artificial neural network and studying electrofilter performance: a case study of zaveh cement factory. Journal of Natural Environment, ():–, 2025. مدلسازی ذرات غبار خروجی از دود کش با شبکه عصبی مصنوعی و مطالعه عملکرد الکتروفیلتر: مطالعه موردی کارخانه سیمان زاوه. URL: https://jne.ut.ac.ir/article_100076.html, doi:10.22059/jne.2025.380898.2713.