• Offered by Research School of Computer Science
  • ANU College ANU College of Engineering and Computer Science
  • Classification Advanced
  • Course subject Computer Science

Real-world data are commonly messy, distributed, and heterogeneous. This course introduces core concepts of data cleaning and standardisation, and data integration, that are aimed at converting and mapping raw data into other formats that allow more efficient and convenient use and analysis of data. The courses also discusses data quality, management, and storage issues as relevant to data analytics.

Learning Outcomes

  1. Critically reflect upon different data sources, types, formats, and structures,
  2. Research, justify and apply data cleaning, preprocessing, and standardisation for data analytics,
  3. Apply data integration concepts and techniques to heterogeneous and distributed data,

  4. Interpret, assess and discuss data quality measurements,

  5. Research and justify advanced data wrangling, data integration, and database techniques as relevant to data analytics

Other Information

If you believe you meet the pre-requisites for this course through alternative means, please contact dataanalytics.cecs@anu.edu.au to apply for an exemption.

Indicative Assessment

  1. Written and practical assignments (40% each)  - LO 1 to 4
  2. Oral presentation and report (20%) - LO 5
  3. Final examination (40%) - LO 1 to 5
    1. The ANU uses Turnitin to enhance student citation and referencing techniques, and to assess assignment submissions as a component of the University's approach to managing Academic Integrity. While the use of Turnitin is not mandatory, the ANU highly recommends Turnitin is used by both teaching staff and students. For additional information regarding Turnitin please visit the ANU Online website.

      Requisite and Incompatibility

      To enrol in this course you must have either: Completed COMP7230 or COMP1730 or COMP6730; and COMP7240 or COMP6240 or COMP2400 or COMP3430. OR be enrolled in MSc Quantitative Biology and Bioinformatics or Adv version. Contact School for permission code

      You will need to contact the Research School of Computer Science to request a permission code to enrol in this course.

      Indicative Reading List

      Data Matching - Concepts and Techniques fro Record Linkage, Entity Resolution and Duplicate Detection,
      Peter Christen, Springer, 2012.
      For more information see: http://users.cecs.anu.edu.au/~christen/data-matching-book-2012.html


      Tuition fees are for the academic year indicated at the top of the page.  

      If you are a domestic graduate coursework or international student you will be required to pay tuition fees. Tuition fees are indexed annually. Further information for domestic and international students about tuition and other fees can be found at Fees.

      Student Contribution Band:
      Band 2
      Unit value:
      6 units

      If you are an undergraduate student and have been offered a Commonwealth supported place, your fees are set by the Australian Government for each course. At ANU 1 EFTSL is 48 units (normally 8 x 6-unit courses). You can find your student contribution amount for each course at Fees.  Where there is a unit range displayed for this course, not all unit options below may be available.

      Units EFTSL
      6.00 0.12500
      Domestic fee paying students
      Year Fee
      2018 $4080
      International fee paying students
      Year Fee
      2018 $5400
      Note: Please note that fee information is for current year only.

      Offerings and Dates

      The list of offerings for future years is indicative only

      Second Semester

      Class number Class start date Last day to enrol Census date Class end date Mode Of Delivery
      9922 23 Jul 2018 30 Jul 2018 31 Aug 2018 26 Oct 2018 In Person

      Responsible Officer: Registrar, Student Administration / Page Contact: Website Administrator / Frequently Asked Questions