• Class Number 5798
• Term Code 3160
• Class Info
• Unit Value 6 units
• Mode of Delivery In Person
• COURSE CONVENER
• AsPr Dale Roberts
• LECTURER
• AsPr Dale Roberts
• Class Dates
• Class Start Date 26/07/2021
• Class End Date 29/10/2021
• Census Date 31/08/2021
• Last Date to Enrol 02/08/2021
SELT Survey Results

Big Data Statistics (STAT3017)

This research-led course provides an introduction to recent developments in Random Matrix Theory and Online Learning that addresses the challenges and opportunities posed by the availability of large amounts of data.

In the first instance, we will review some classic results from multivariate statistical theory, matrix analysis, and probability theory. Then we will present the salient statistical features of big data (e.g., heterogeneity, noise accumulation, spurious correlation, and incidental endogeneity) and show how this impacts on traditional statistical methods and theory.

We follow with an introduction to modern Random Matrix theory and its application in statistics. Applications presented may include topics such as high-dimensional statistical inference, large covariance matrices, large-scale statistical learning through subsampling, sparsification of large matrices, principal component analysis, and dimension reduction.

We conclude with an introduction to the theory of online learning (aka. sequential prediction) to handle the situation of streaming data.

Students will use and learn about the latest computational tools to work with big and streaming data sets. Example data sets may be drawn from areas such finance, web analytics, digital marketing, and satellite imagery data.

## Learning Outcomes

Upon successful completion, students will have the knowledge and skills to:

1. Explain how statistical features of big data impact traditional statistical methods and theory;
2. Discuss Random Matrix theory and its application in statistics on large scale;
3. Summarise the theory of sequential prediction and management of streaming data; and,
4. Demonstrate the use of computational tools to work with big and streaming data sets.

## Research-Led Teaching

This course is based on recent research papers and surveys applications of random matrix theory in statistics. The topic is rapidly advancing and recent results may be introduced into the course as they appear in the literature.

## Examination Material or equipment

There is no final examination for this course. The final assessment is a project

## Required Resources

None. Resources (research papers, etc) and lecture notes will be provided throughout the semester.

## Staff Feedback

Students will be given feedback in the following forms in this course:
• Feedback to the whole class, to groups, to individuals, focus groups

## Student Feedback

ANU is committed to the demonstration of educational excellence and regularly seeks feedback from students. Students are encouraged to offer feedback directly to their Course Convener or through their College and Course representatives (if applicable). The feedback given in these surveys is anonymous and provides the Colleges, University Education Committee and Academic Board with opportunities to recognise excellent teaching, and opportunities for improvement. The Surveys and Evaluation website provides more information on student surveys at ANU and reports on the feedback provided on ANU courses.

## Class Schedule

Week/Session Summary of Activities Assessment
1 Introduction to the challenges of Big Data and overview of the course. Review of some prerequisite concepts.
2 Further matrix analysis, eigenvalues and eigenvectors, the multivariate normal distribution.
3 Fundamental tools for studying limiting spectral distributions, Marcenko-Pastur distributions, Fisher spectral distribution. Assignment 1 Due
4 CLT for linear spectral statistics: Introduction and integration tools.
5 Moments and statistics of the Marcenko-Pastur distribution.
6 CLT for linear spectral statistics: Sample covariance matrix, Bai and Silverstein’s CLT, CLT for random Fisher matrices. Assignment 2 Due
7 Generalised variance in higher dimensions. Assignment 3 Due
8 Multiple correlation coefficient.
9 Multivariate linear regression in the high-dimensional setting. Assignment 4 Due
10 PCA and high-dimensional spiked population models.
11 Applications and recent theoretical results.
12 Applications and recent theoretical results. Assignment 5 Due

## Tutorial Registration

There are no regular tutorials for this course.

## Assessment Summary

Assessment task Value Due Date Return of assessment Learning Outcomes
Assignment 1 20 % 09/08/2021 23/08/2021 1-4
Assignment 2 20 % 30/08/2021 13/09/2021 1-4
Assignment 3 20 % 20/09/2021 04/10/2021 1-4
Assignment 4 20 % 05/10/2021 18/10/2021 1-4
Final Project 20 % 04/11/2021 02/12/2021 1-4

* If the Due Date and Return of Assessment date are blank, see the Assessment Tab for specific Assessment Task details

## Policies

ANU has educational policies, procedures and guidelines, which are designed to ensure that staff and students are aware of the University’s academic standards, and implement them. Students are expected to have read the Academic Misconduct Rule before the commencement of their course. Other key policies and guidelines include:

## Assessment Requirements

The ANU is using Turnitin to enhance student citation and referencing techniques, and to assess assignment submissions as a component of the University's approach to managing Academic Integrity. For additional information regarding Turnitin please visit the ANU Online website Students may choose not to submit assessment items through Turnitin. In this instance you will be required to submit, alongside the assessment item itself, hard copies of all references included in the assessment item.

## Moderation of Assessment

Marks that are allocated during Semester are to be considered provisional until formalised by the College examiners meeting at the end of each Semester. If appropriate, some moderation of marks might be applied prior to final results being released.

## Participation

Course content delivery will take the form of pre-recorded weekly lectures (available on Wattle) and a weekly workshop delivered in hybrid format (on campus and live through a scheduled Zoom session with a recording available afterwards).

## Examination(s)

There is no final examination for this course. The final assessment is a project.

Value: 20 %
Due Date: 09/08/2021
Return of Assessment: 23/08/2021
Learning Outcomes: 1-4

Assignment 1

This assessment is to be done individually. The assignment will be a take-home assessment that will typically involve a mix of ‘pen-and-paper’ questions and/or a ‘computational’ questions. The question(s) will cover material that has been seen in previous lectures and are aimed at ensuring students are routinely studying the material. The Homework questions will be released two weeks before the due date. The assessment will be submitted in Wattle using TurnitIn by 9:00am on the due date and marks/feedback will be given on the 'Return of Assessment Date'.

Value: 20 %
Due Date: 30/08/2021
Return of Assessment: 13/09/2021
Learning Outcomes: 1-4

Assignment 2

This assessment is to be done individually. The assignment will be a take-home assessment that will typically involve a mix of ‘pen-and-paper’ questions and/or a ‘computational’ questions. The question(s) will cover material that has been seen in previous lectures and are aimed at ensuring students are routinely studying the material. The Homework questions will be released two weeks before the due date. The assessment will be submitted in Wattle using TurnitIn by 9:00am on the due date and marks/feedback will be given on the 'Return of Assessment Date'.

Value: 20 %
Due Date: 20/09/2021
Return of Assessment: 04/10/2021
Learning Outcomes: 1-4

Assignment 3

This assessment is to be done individually. The assignment will be a take-home assessment that will typically involve a mix of ‘pen-and-paper’ questions and/or a ‘computational’ questions. The question(s) will cover material that has been seen in previous lectures and are aimed at ensuring students are routinely studying the material. The Homework questions will be released two weeks before the due date. The assessment will be submitted in Wattle using TurnitIn by 9:00am on the due date and marks/feedback will be given on the 'Return of Assessment Date'.

Value: 20 %
Due Date: 05/10/2021
Return of Assessment: 18/10/2021
Learning Outcomes: 1-4

Assignment 4

This assessment is to be done individually. The assignment will be a take-home assessment that will typically involve a mix of ‘pen-and-paper’ questions and/or a ‘computational’ questions. The question(s) will cover material that has been seen in previous lectures and are aimed at ensuring students are routinely studying the material. The Homework questions will be released two weeks before the due date. The assessment will be submitted in Wattle using TurnitIn by 9:00am on the due date and marks/feedback will be given on the 'Return of Assessment Date'.

Value: 20 %
Due Date: 04/11/2021
Return of Assessment: 02/12/2021
Learning Outcomes: 1-4

Final Project

This assessment is to be done individually. The final project will be a take-home assessment that will typically involve a mix of ‘pen-and-paper’ questions and/or a ‘computational’ questions. The question(s) will cover material that has been seen in previous lectures and are aimed at ensuring students are routinely studying the material. The Homework questions will be released two weeks before the due date. The assessment will be submitted in Wattle using TurnitIn by 9:00am on the due date and marks/feedback will be given on the 'Return of Assessment Date'.

## Online Submission

The ANU uses Turnitin to enhance student citation and referencing techniques, and to assess assignment submissions as a component of the University's approach to managing Academic Integrity. The use of Turnitin is mandatory without an exemption. Any student identified, either during the current semester or in retrospect, as having used ghost writing services will be investigated under the University’s Academic Misconduct Rule. The assignments and final project must be done individually, any similarities between results will be investigated under the University’s Academic Misconduct Rule.

## Hardcopy Submission

All assessment submission in the course is online.

## Late Submission

No submission of assessment tasks without an extension after the due date will be permitted. If an assessment task is not submitted by the due date, a mark of 0 will be awarded.

## Referencing Requirements

Accepted academic practice for referencing sources that you use in presentations can be found via the links on the Wattle site, under the file named “ANU and College Policies, Program Information, Student Support Services and Assessment”. Alternatively, you can seek help through the Students Learning Development website.

## Returning Assignments

Assignment will be returned online.

## Extensions and Penalties

Extensions and late submission of assessment pieces are covered by the Student Assessment (Coursework) Policy and Procedure The Course Convener may grant extensions for assessment pieces that are not examinations or take-home examinations. If you need an extension, you must request an extension in writing on or before the due date. If you have documented and appropriate medical evidence that demonstrates you were not able to request an extension on or before the due date, you may be able to request it after the due date.

## Resubmission of Assignments

Assignments may not be resubmitted.

## Privacy Notice

Academic Quality Assurance Committee monitors the performance of students, including attrition, further study and employment rates and grade distribution, and College reports on quality assurance processes for assessment activities, including alignment with national and international disciplinary and interdisciplinary standards, as well as qualification type learning outcomes. Since first semester 1994, ANU uses a grading scale for all courses. This grading scale is used by all academic areas of the University.

## Support for students

The University offers students support through several different services. You may contact the services listed below directly or seek advice from your Course Convener, Student Administrators, or your College and Course representatives (if applicable).

## Convener

 AsPr Dale Roberts 61257336 dale.roberts@anu.edu.au

### Research Interests

Probability theory, stochastic process, and applications

### AsPr Dale Roberts

 Friday 10:00 12:00 Friday 10:00 12:00

## Instructor

 AsPr Dale Roberts x57336 dale.roberts@anu.edu.au

### AsPr Dale Roberts

 Friday 10:00 12:00 Friday 10:00 12:00