Statistics @ LMU

High-Performance Computing Resources

The Statistics and Data Science Group at LMU maintains several independent computing resources in order to meet the needs of faculty research and student education.  Resources can be made available to students enrolled in appropriate courses or working on independent research projects, as well as to interested LMU faculty.

Grid computing cluster This remotely-accessed cluster is a collection of 13 networked server units containing 256 cores running at 2.4 to 2.6 Ghz. The operating system is the open-source Linux distribution Rocks 7.0, which is based on CentOS 6.5 for a 64-bit architecture and optimized for efficient administration of cluster configurations. Parallel processing is realized with the OpenMP API for multi-platform shared memory multiprocessing programming, and the Sun Grid Engine for the management of parallel job distribution and execution. Available software resources include MATLAB, R, and Python, plus the ability to run most custom-designed programs. The cluster is optimal for multi-user environments and projects, and can accommodate simulation or data analysis tasks that require long run times (days or weeks).

GPU Workstations We also have two GPU servers dedicated to large-scale machine learning and statistical computations. They particularly come in handy when training big neural networks. The first workstation has eight RTX 2080 Ti Nvidia GPUs, and the second has five RTX 3090 Nvidia GPUs.

High-performance Computing Lab Our high-performance computing lab, located in the Math department, contains eight Ubuntu and Windows research workstations for faculty and student use. Two of these machines are each equipped with 2 GPUs for machine learning algorithms. Software includes MATLAB, R, Python, Stata MP. Great for collaboration, the lab computers can be used to access cluster and GPU server resources as well as group work on the workstations. 

B.S. Major in Statistics and Data Science (SDS)

The emerging field of data science combines computational, mathematical, and statistical thinking to data-rich problems in a wide variety of fields. Coupled with a focus on rigorous statistical practice, the new B.S. degree in Statistics and Data Science at LMU launched in 2022 and will graduate its first senior class in 2026. Prospective or interested students are encouraged to contact me for more information about this new opportunity!

Applied Statistics 4 semester hours. An introduction to basic methods of extracting information from data with a focus on statistical methods and interpretation of results. Exploratory and descriptive data analysis including graphical examination of data and measures of central tendency and spread. Classical and non-parametric tools of hypothesis testing (t tests, one-way, and two-way ANOVA, Mann-Whitney and Kruskal-Wallace for mean-comparison problems). Simple linear regression. Practical considerations of experimental design. Analysis of data using modern computational software (e.g. R).

Multivariable Statistics 4 semester hours. Statistical analysis of large multivariate datasets. Multivariate densities and distributions. The general linear model and multivariate regression, analysis of variance. Multilevel linear models. Clustering and factor analysis. Time series models. Modern computational software.

Probability and Mathematical Statistics 4 semester hours. Probability and statistics with an emphasis on mathematical techniques of analysis. Probability topics: sample space, basic probability rules, conditional probability, independence, Bayes theorem, densities and cumulative distribution functions, expectations, law of large numbers, Central Limit Theorem, functions of random variables, and stochastic modeling. Statistics topics: sampling distributions, point and interval estimation, and mathematical methods of hypothesis testing. Additional topics may include stochastic simulation, bootstrapping, and Bayesian inference.

Advanced Topics in Probability 4 semester hours. Advanced topics in probability (e.g. Stochastic processes, Markov chains, Monte Carlo methods, etc.) chosen by the instructor. Written and oral presentations are required.

Machine Learning 4 semester hours. Linear regression, logistic/softmax regression, support vector machine, k-nearest neighbors, tree-based methods, linear separability, overfitting/underfitting, regularizers, gradient descent method. Possible additional topics: kernel methods, k-means clustering, principal component analysis, dimensionality reduction, semi-supervised learning, boosting, random forest, sampling methods.

Big Data Visualization 4 semester hours. Introduction to the tools and techniques of modern data visualization including concepts of scraping, wrangling, cleaning, and processing data from the web and other large databases. The course focuses on visualizing multidimensional data and designing clear and appropriate data graphics through apps and interactive displays (e.g., maps).

Deep Learning 4 semester hours. Neural networks and related algorithms: stochastic gradient descent and backpropagation. Modern deep learning framework (e.g. TensorFlow, Pytorch) and GPU computing. Convolutional Neural Networks and applications to image recognition. Recurrent Neural Network, Transformer networks and applications to natural language processing (e.g. sentiment analysis, translation, natural language modeling).

Modern Computational Statistics 4 semester hours. Generalized linear models: logistic, multinomial, and Poisson regression; bootstrapping: resampling simulations, estimation, confidence sets, and hypothesis testing; Bayesian methods: computational techniques such as Markov Chain Monte Carlo and Metropolis-Hastings, estimation, credible sets, and hypothesis testing.

Program Requirements

See the complete program requirements for the B.S. major in the university bulletin.