• Offered by Mathematical Sciences Institute
• ANU College ANU Joint Colleges of Science
Specialist
• Course subject Mathematics
• Areas of interest Mathematics
• Mode of delivery In Person
Data Mining (MATH6210)
The main focus of the course will be supervised learning, primarily for classification.  The emphasis will be on practical applications of the methodologies that are described, with the R system used for the computations.

Attention will be given to:

1. Generalizability and predictive accuracy, in the practical contexts in which methods are applied.
2. Low-dimensional visual representation of results, as an aid to diagnosis and insight.
3. Interpretability of model parameters, including potential for misinterpretation.
There will be very limited attention to regression methods with a continuous outcome variable.  Relevant statistical theory will mostly be assumed and described rather than derived mathematically.  There will be somewhat more attention to the mathematical derivation and description of algorithms.

Topic to be covered include:

• Basic statistical ideas - populations, distributions, samples and random samples
• Classification models and methods - including: linear discriminant analysis; trees; random forests; neural nets; boosting and bagging approaches; support vector machines.
• Linear regression approaches to classification, compared with linear discriminant analysis,
• The training/test approach to assessing accuracy, and cross-validation.
• Strategies in the (common) situation where source and target population differ, typically in time but in other respects also.
• Unsupervised models - kmeans, association rules, hierarchical clustering, model based clusters.
• Low-dimensional views of classification results - distance methods and ordination.
• Strategies for working with large data sets.
• Practical approaches to classification with real life data sets, using different methods to gain different insights into presentation.
• Privacy and security.
• Use of the R system for handling the calculations.

Note: Graduate students attend joint classes with undergraduates but will be assessed separately.

## Learning Outcomes

Upon successful completion, students will have the knowledge and skills to:

On satisfying the requirements of this course, students will have the knowledge and skills to:

1. Explain the fundamental issues involved in the use of the training/test methodology, cross-validation and the bootstrap to provide accuracy assessments.
2. Understand and explain ideas of source and target sample, and their relevance to the practical application of classification and other data mining techniques.
3. Demonstrate accurate and efficient use of classification and related data mining techniques, using the R system for the computations.
4. Demonstrate capacity for mathematical reasoning through analyzing, proving and explaining concepts from the theory that underpins classification and related data mining methods.
5. Apply problem-solving using classification and related data mining techniques to diverse situations in business, biology, engineering and other sciences.

## Indicative Assessment

Assessment will be based on:

• 3 Assignments (60%; LO 1-5)
• Presentation (40%; LO1-5)

The ANU uses Turnitin to enhance student citation and referencing techniques, and to assess assignment submissions as a component of the University's approach to managing Academic Integrity. While the use of Turnitin is not mandatory, the ANU highly recommends Turnitin is used by both teaching staff and students. For additional information regarding Turnitin please visit the ANU Online website.

Offered subject to staff availability and student demand. Regular meetings.

## Requisite and Incompatibility

You will need to contact the Mathematical Sciences Institute to request a permission code to enrol in this course.

## Fees

Tuition fees are for the academic year indicated at the top of the page.

If you are a domestic graduate coursework or international student you will be required to pay tuition fees. Tuition fees are indexed annually. Further information for domestic and international students about tuition and other fees can be found at Fees.

Student Contribution Band:
2
Unit value:
6 units

If you are an undergraduate student and have been offered a Commonwealth supported place, your fees are set by the Australian Government for each course. At ANU 1 EFTSL is 48 units (normally 8 x 6-unit courses). You can find your student contribution amount for each course at Fees.  Where there is a unit range displayed for this course, not all unit options below may be available.

Units EFTSL
6.00 0.12500

## Course fees

Domestic fee paying students
Year Fee
2017 \$3660
International fee paying students
Year Fee
2017 \$4878
Note: Please note that fee information is for current year only.

## Offerings, Dates and Class Summary Links

ANU utilises MyTimetable to enable students to view the timetable for their enrolled courses, browse, then self-allocate to small teaching activities / tutorials so they can better plan their time. Find out more on the Timetable webpage.

There are no current offerings for this course.