Course description
In this course students build a foundation for doing data science, machine learning, and artificial intelligence (AI). The course employs a combination of theory and hands-on experience using Python programming tools. The focus is on the foundational computational statistical analysis and visualization methods underpinning modern data science, machine learning, and AI. The hands-on component of the course uses the Python packages NumPy, pandas, seaborn, statsmodels, and PyMC3, along with selected other open source packages. This course provides an introduction to the basic concepts of data science; presents effective methods of data visualization and summary statistics to explore complex data; and reviews probability theory, with an emphasis on conditional probability as a foundation of modern computational statistical methods and AI. The course covers basic computational statistical inference employing three approaches: maximum likelihood frequentist, bootstrap frequentist, and Bayesian. There is an overview of the properties and behavior of the rich family of linear models, which are foundational to many machine learning and AI algorithms, and a focus on applying Bayesian models and inference to real-world problems. We explore models for time series data and (time permitting) spatial data. An independent project is required of all students registering for graduate credit.