• Offered by Research School of Computer Science
  • ANU College ANU College of Engineering and Computer Science
  • Classification Advanced
  • Course subject Computer Science
  • Academic career PGRD
  • Course convener
    • Prof Peter Christen
  • Mode of delivery In Person
  • Co-taught Course
  • Offered in Summer Session 2020
    Second Semester 2020
    Spring Session 2020
    See Future Offerings

Note: Non-DADAN/MADAN students wanting to enrol in the non-standard session offerings are required to seek approval from their Program Convener.

Real-world data are commonly messy, distributed, and heterogeneous. This course introduces core concepts of data cleaning and standardisation, and data integration, that are aimed at converting and mapping raw data into other formats that allow more efficient and convenient use and analysis of data. The courses also discusses data quality, management, and storage issues as relevant to data analytics.

Learning Outcomes

Upon successful completion, students will have the knowledge and skills to:

  1. Critically reflect upon different data sources, types, formats, and structures,
  2. Research, justify and apply data cleaning, preprocessing, and standardisation for data analytics,
  3. Apply data integration concepts and techniques to heterogeneous and distributed data,
  4. Interpret, assess and discuss data quality measurements,
  5. Research and justify advanced data wrangling, data integration, and database techniques as relevant to data analytics

Indicative Assessment

  1. Written and practical assignments (40) [LO 1,2,3,4]
  2. Oral presentation and report (20) [LO 5]
  3. Final examination (40) [LO 1,2,3,4,5]

In response to COVID-19: Please note that Semester 2 Class Summary information (available under the classes tab) is as up to date as possible. Changes to Class Summaries not captured by this publication will be available to enrolled students via Wattle. 

The ANU uses Turnitin to enhance student citation and referencing techniques, and to assess assignment submissions as a component of the University's approach to managing Academic Integrity. While the use of Turnitin is not mandatory, the ANU highly recommends Turnitin is used by both teaching staff and students. For additional information regarding Turnitin please visit the ANU Online website.

Workload

The workload for the course is around 130 hours, including reading, the viewing of online course material, participation in face-to-face lectures, practical labs and tutorials, and preparation for assessments.

Inherent Requirements

Not applicable

Requisite and Incompatibility

To enrol in this course you must have completed either COMP7230, COMP6730 or COMP6710; AND COMP7240 or COMP6240. Incompatible with COMP3430.

Prescribed Texts

None

Preliminary Reading

Data Matching - Concepts and Techniques fro Record Linkage, Entity Resolution and Duplicate Detection,
Peter Christen, Springer, 2012.
For more information see: http://users.cecs.anu.edu.au/~christen/data-matching-book-2012.html

Fees

Tuition fees are for the academic year indicated at the top of the page.  

If you are a domestic graduate coursework or international student you will be required to pay tuition fees. Tuition fees are indexed annually. Further information for domestic and international students about tuition and other fees can be found at Fees.

Student Contribution Band:
2
Unit value:
6 units

If you are an undergraduate student and have been offered a Commonwealth supported place, your fees are set by the Australian Government for each course. At ANU 1 EFTSL is 48 units (normally 8 x 6-unit courses). You can find your student contribution amount for each course at Fees.  Where there is a unit range displayed for this course, not all unit options below may be available.

Units EFTSL
6.00 0.12500
Domestic fee paying students
Year Fee
2020 $4320
International fee paying students
Year Fee
2020 $5760
Note: Please note that fee information is for current year only.

Offerings, Dates and Class Summary Links

The list of offerings for future years is indicative only.
Class summaries, if available, can be accessed by clicking on the View link for the relevant class number.

Summer Session

Class number Class start date Last day to enrol Census date Class end date Mode Of Delivery Class Summary
1613 13 Jan 2020 24 Jan 2020 01 Feb 2020 13 Mar 2020 In Person N/A

Second Semester

Class number Class start date Last day to enrol Census date Class end date Mode Of Delivery Class Summary
9522 27 Jul 2020 03 Aug 2020 31 Aug 2020 30 Oct 2020 In Person N/A

Spring Session

Class number Class start date Last day to enrol Census date Class end date Mode Of Delivery Class Summary
GCDE Intensive
9628 28 Sep 2020 04 Oct 2020 09 Oct 2020 06 Nov 2020 Online N/A

Responsible Officer: Registrar, Student Administration / Page Contact: Website Administrator / Frequently Asked Questions