- Class Number 5780
- Term Code 3260
- Class Info
- Unit Value 6 units
- Mode of Delivery In Person
- Dr Dawei Chen
- Dr Dawei Chen
- Class Dates
- Class Start Date 25/07/2022
- Class End Date 28/10/2022
- Census Date 31/08/2022
- Last Date to Enrol 01/08/2022
Processing of semi-structured documents such as internet pages, RSS feeds and their accompanying news items, and PDF brochures is considered from the perspective of interpreting the content. This course considers the \document" and its various genres as a fundamental object for business, government and community. For this, the course covers four broad areas: (A) information retrieval, (B) natural language processing, (C) machine learning for documents, and (D) relevant tools for the Web. Basic tasks here are covered including content collection and extraction, formal and informal natural language processing, information extraction, information retrieval, classification and analysis. Fundamental probabilistic techniques for performing these tasks, and some common software systems will be covered, though no area will be covered in any depth.
Upon successful completion, students will have the knowledge and skills to:Upon successful completion of the course, the student will have an understanding of the role documents play in business and community, and the various digital resources available for document analysis. Moreover, the student will have the background theory and practical knowledge necessary to plan and execute a basic document analysis project. The student will be able to:
- differentiate between the basic probabilistic theories of language and document structure, information retrieval, and classification, clustering and document feature engineering.
- identify the basic algorithms and software available for probabilistic theories of language and be proficient at using common libraries for natural language processing to perform basic analysis tasks.
- index a document collection for use in an information retrieval system. Demonstrate advanced knowledge of basic theories and algorithms to determine large scale named-entity matching and standardization of names within a collection.
- perform automated classification using probabilistic theories.
A laptop or desktop with a reliable internet connection is required for accessing the course material on Wattle and for completing the practicals and assignments. Python and Jupyter Notebook will be used extensively in this course so being able to install freely available software will be necessary. An alternative is to have access to a laptop or desktop where appropriate software is already installed.
Further details on software used, and instructions, can be found on the Wattle site for the course.
The following textbooks will help you to better understand the course material and broaden your understanding. They are both provided online for free by the authors and are highly recommended. The first book covers information retrieval in a very approachable way, but it goes into much more depth than we will cover in this course. The second book is a currently evolving new edition of one of the best Natural Language Processing (NLP) textbooks. This book is up to date with the latest approaches and covers many topics in much greater depth than we will cover in this course.
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press. 2008.
Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed. draft). 2022.
Staff FeedbackStudents will be given feedback in the following forms in this course:
- Written comments
- Verbal comments
- Feedback to the whole class, to groups, to individuals, focus groups
Student FeedbackANU is committed to the demonstration of educational excellence and regularly seeks feedback from students. Students are encouraged to offer feedback directly to their Course Convener or through their College and Course representatives (if applicable). The feedback given in these surveys is anonymous and provides the Colleges, University Education Committee and Academic Board with opportunities to recognise excellent teaching, and opportunities for improvement. The Surveys and Evaluation website provides more information on student surveys at ANU and reports on the feedback provided on ANU courses.
|Week/Session||Summary of Activities||Assessment|
|1||Course Introduction and Boolean Retrieval||Practice Quiz (Ungraded); Assignment (IR) released|
|2||Ranked Retrieval and Evaluation in Information Retrieval (IR)||Quiz for week 2|
|3||Web search and Introduction to Machine Learning||Quiz for week 3|
|4||Presentation and Deep Neural Networks (DNN)||Quiz for week 4; Assignment (IR) due|
|5||DNN in Practice and DNN for Structured Data||Quiz for week 5; Assignment (ML) released|
|6||Attention and Transformers||Quiz for week 6|
|7||Neural Language Models and Clustering||Quiz for week 7; Assignment (ML) due|
|8||Semantics and Reference Resolution in NLP; Constituency Parsing||Quiz for week 8; Assignment (NLP) released|
|9||Dependency Parsing||Quiz for week 9|
|10||Sequence Labelling and Parsing; NLP Tasks||Quiz for week 10|
|11||Language Models and Evaluation in NLP||Quiz for week 11; Assignment (NLP) due|
|12||Multilingual and Low Resource NLP|
|Assessment task||Value||Learning Outcomes|
|Quizzes (5%)||5 %||1, 2, 3, 4, 5, 6 (details available on Wattle)|
|IR Assignment (15%)||15 %||1, 2, 5|
|ML Assignment (15%)||15 %||1, 2, 3, 4|
|NLP Assignment (15%)||15 %||1, 2, 6|
|Final Exam (50%)||50 %||1, 2, 3, 4, 5, 6|
* If the Due Date and Return of Assessment date are blank, see the Assessment Tab for specific Assessment Task details
PoliciesANU has educational policies, procedures and guidelines, which are designed to ensure that staff and students are aware of the University’s academic standards, and implement them. Students are expected to have read the Academic Misconduct Rule before the commencement of their course. Other key policies and guidelines include:
Assessment RequirementsThe ANU is using Turnitin to enhance student citation and referencing techniques, and to assess assignment submissions as a component of the University's approach to managing Academic Integrity. For additional information regarding Turnitin please visit the ANU Online website Students may choose not to submit assessment items through Turnitin. In this instance you will be required to submit, alongside the assessment item itself, hard copies of all references included in the assessment item.
Moderation of AssessmentMarks that are allocated during Semester are to be considered provisional until formalised by the College examiners meeting at the end of each Semester. If appropriate, some moderation of marks might be applied prior to final results being released.
Assessment Task 1
Learning Outcomes: 1, 2, 3, 4, 5, 6 (details available on Wattle)
Online Wattle quizzes will be offered weekly. There are 11 quizzes in total. Marks for all quizzes (except the ungraded quiz in week 1) will be totalled and scaled to contribute 5% to the overall course mark. All quizzes (except the ungraded quiz in week 1) contribute the same amount to the overall course mark. Quizzes are primarily
intended for self-learning. Two attempts are permitted for each quiz and the questions in the two attempts may not be the same. Automated feedback on correct
answers is given after each attempt. Submission deadlines provided in Wattle.
Assessment Task 2
Learning Outcomes: 1, 2, 5
IR Assignment (15%)
A programming assignment that also requires you to provide written answers to questions. Covers the Information Retrieval material in the course. Details provided through Wattle. Submission is through Wattle. Submission deadline provided in Wattle.
Assessment Task 3
Learning Outcomes: 1, 2, 3, 4
ML Assignment (15%)
A programming assignment that also requires you to provide written answers to questions. Covers the Machine Learning material in the course. Details provided through Wattle. Submission is through Wattle. Submission deadline provided in Wattle.
Assessment Task 4
Learning Outcomes: 1, 2, 6
NLP Assignment (15%)
A programming assignment that also requires you to provide written answers to questions. Covers the Natural Language Processing material in the course. Details provided through Wattle. Submission is through Wattle. Submission deadline provided in Wattle.
Assessment Task 5
Learning Outcomes: 1, 2, 3, 4, 5, 6
Final Exam (50%)
The Final exam will be a 3-hour open book exam. Detailed information will be provided via the Wattle course site.
Academic IntegrityAcademic integrity is a core part of our culture as a community of scholars. At its heart, academic integrity is about behaving ethically. This means that all members of the community commit to honest and responsible scholarly practice and to upholding these values with respect and fairness. The Australian National University commits to embedding the values of academic integrity in our teaching and learning. We ensure that all members of our community understand how to engage in academic work in ways that are consistent with, and actively support academic integrity. The ANU expects staff and students to uphold high standards of academic integrity and act ethically and honestly, to ensure the quality and value of the qualification that you will graduate with. The University has policies and procedures in place to promote academic integrity and manage academic misconduct. Visit the following Academic honesty & plagiarism website for more information about academic integrity and what the ANU considers academic misconduct. The ANU offers a number of services to assist students with their assignments, examinations, and other learning activities. The Academic Skills and Learning Centre offers a number of workshops and seminars that you may find useful for your studies.
Online SubmissionThe ANU uses Turnitin to enhance student citation and referencing techniques, and to assess assignment submissions as a component of the University's approach to managing Academic Integrity. While the use of Turnitin is not mandatory, the ANU highly recommends Turnitin is used by both teaching staff and students. For additional information regarding Turnitin please visit the ANU Online website.
Hardcopy SubmissionFor some forms of assessment (hand written assignments, art works, laboratory notes, etc.) hard copy submission is appropriate when approved by the Associate Dean (Education). Hard copy submissions must utilise the Assignment Cover Sheet. Please keep a copy of tasks completed for your records.
Late submission of assessment tasks without an extension are penalised at the rate of 5% of the possible marks available per working day or part thereof. Late submission of assessment tasks is not accepted after 1 working day after the due date, or on or after the date specified in the course outline for the return of the assessment item. Late submission is not accepted for take-home examinations.
Referencing RequirementsAccepted academic practice for referencing sources that you use in presentations can be found via the links on the Wattle site, under the file named “ANU and College Policies, Program Information, Student Support Services and Assessment”. Alternatively, you can seek help through the Students Learning Development website.
Extensions and PenaltiesExtensions and late submission of assessment pieces are covered by the Student Assessment (Coursework) Policy and Procedure The Course Convener may grant extensions for assessment pieces that are not examinations or take-home examinations. If you need an extension, you must request an extension in writing on or before the due date. If you have documented and appropriate medical evidence that demonstrates you were not able to request an extension on or before the due date, you may be able to request it after the due date.
Distribution of grades policyAcademic Quality Assurance Committee monitors the performance of students, including attrition, further study and employment rates and grade distribution, and College reports on quality assurance processes for assessment activities, including alignment with national and international disciplinary and interdisciplinary standards, as well as qualification type learning outcomes. Since first semester 1994, ANU uses a grading scale for all courses. This grading scale is used by all academic areas of the University.
Support for studentsThe University offers students support through several different services. You may contact the services listed below directly or seek advice from your Course Convener, Student Administrators, or your College and Course representatives (if applicable).
- ANU Health, safety & wellbeing for medical services, counselling, mental health and spiritual support
- ANU Diversity and inclusion for students with a disability or ongoing or chronic illness
- ANU Dean of Students for confidential, impartial advice and help to resolve problems between students and the academic or administrative areas of the University
- ANU Academic Skills and Learning Centre supports you make your own decisions about how you learn and manage your workload.
- ANU Counselling Centre promotes, supports and enhances mental health and wellbeing within the University student community.
- ANUSA supports and represents undergraduate and ANU College students
- PARSA supports and represents postgraduate and research students