Big Data Statistics (STAT7017)

This course involves on campus teaching. For students unable to come to campus there will be a remote option. See the Class Summary for more details.

This research-led course provides an introduction to recent developments in Random Matrix Theory and Online Learning that addresses the challenges and opportunities posed by the availability of large amounts of data.

In the first instance, we will review some classic results from multivariate statistical theory, matrix analysis, and probability theory. Then we will present the salient statistical features of big data (e.g., heterogeneity, noise accumulation, spurious correlation, and incidental endogeneity) and show how this impacts on traditional statistical methods and theory.

We follow with an introduction to modern Random Matrix theory and its application in statistics. Applications presented may include topics such as high-dimensional statistical inference, large covariance matrices, large-scale statistical learning through subsampling, sparsification of large matrices, principal component analysis, and dimension reduction.

We conclude with an introduction to the theory of online learning (aka. sequential prediction) to handle the situation of streaming data.

Students will use and learn about the latest computational tools to work with big and streaming data sets. Example data sets may be drawn from areas such finance, web analytics, digital marketing, and satellite imagery data.

Learning Outcomes

Upon successful completion, students will have the knowledge and skills to:

1. Explain in detail how statistical features of big data impacts traditional statistical methods and theory;
2. Discuss in depth random Matrix theory and its application in statistics on large scale;
3. Critically discuss the theory of sequential prediction and management of streaming data; and
4. Demonstrate in detail the use of computational tools to work with big and streaming data sets.

Indicative Assessment

1. Typical assessment may include, but is not restricted to: exams, assignments, quizzes, presentations and other assessment as appropriate (100) [LO 1,2,3,4]

Requisite and Incompatibility

To enrol in this course you must have completed STAT6039 or STAT6013, and have completed STAT6038 or STAT6014 or STAT7001. Incompatible with STAT3017.

Prescribed Texts

Fees

