Michael A. Covington
> Courses
> Natural Language Processing Techniques
CSCI/LING 8570 Natural Language Processing Techniques
About this course
View syllabus
This course is designed for students in the M.S. program in Artificial Intelligence but is also
open to other graduate students. It is usual to take CSCI 6540 (Symbolic Programming)
the preceding semester.
In 2010, this course was taught using Python and the NLTK.
In 2011, it reverted to being Prolog-based, although new material was
introduced. It will remain Prolog-based.
This is a course in the hard parts of natural language processing, such as
parsing and semantic modeling.
There are plenty of shallow statistical methods that you can learn out of books, and
I will acquaint you with them, but they aren't the main focus of the course.
The main focus of the course is to equip you to implement sophisticated algorithms in
Prolog (which is uniquely suitable for some of them), to understand parsing (so that
you can look at the output of a parser and judge whether it is correct, and build special-purpose
parsers for your own needs), and to understand semantic modeling (how to get from language
to knowledge representation).
The historical context is that, from about 1997 to 2007, the whole field shifted toward
shallow statistical methods, but now, with IBM's "Watson" and other developments, the "hard parts"
are in demand again, as everyone learned that shallow methods can only go so far.
That is the rationale for using an older textbook together with a lot of new supplementary material.
Students taking this course should know how to program a computer
in Prolog and also in a general-purpose programming language of their own choosing
such as Java, C#, or Python. If you do not already know Prolog, you are not prepared
— no one has ever successfully "picked up Prolog" while taking this course.
Arrangements will be made for students who took the Python-based version and want to
take the Prolog-based version for separate credit as ARTI 8800. Contact me if you
want to do this.
Online journals and other literature
Some of this material is accessible only from on campus because it depends
on UGA library subscriptions.
ACL publications (Computational Linguistics, ACL Conference Proceedings, etc.)
ACL Computational Linguistics Wiki (reference information and data)
Computational Linguistics
Natural Language Engineering
Literary and Linguistic Computing
Manning, Raghavan, and Schütze,
Introduction to Information Retrieval (full text)
Natural Language Toolkit (NLTK) (Python-based)
International
Journal of Computer Processing of Oriental Languages
ACM
Transactions on Asian Language Information Processing (TALIP)
Index of online journals in UGA libraries
Index of books and printed journals in UGA libraries
Supplemental material for this course
Textbook corrections
for Natural Language Processing for Prolog Programmers
Overview:
Some terms and resources
Terminology
Pragmatics
Prolog i/o predicates (to supplement
Chapter 2 of Prolog Programming in Depth)
Text statistics
Text classification
Syntax: How do we know which of 2 trees is the right one?
Tagging:
General information
"Cheat sheet" based on Chapter 4
Lemmatization (How to lemmatize English)
The Penn Treebank:
General information
Our local Prolog-adapted version
ProNTo (Prolog Natural
Language Tools, mostly student projects from 8570)
Latent Semantic Indexing:
Linear algebra refresher for 8570
Worked example of Latent Semantic Indexing
The R statistical software package:
Download your own copy of R
Using R to Compare Groups
Using R to Detect Changes in Individuals
Using R to Find Correlations
Files and Recordkeeping With R
The remainder of this page will continue like a blog, with materal added day by day.
Jan. 6, 2012 — A suggested linguistics book
If you have not had a linguistics course, I suggest reading a general linguistics book
for background. You can do this very cheaply by buying an older edition of
a textbook such as Fromkin and Rodman.
Click here for some useful listings.
Jan. 17, 2012 — Meet the LINGUIST List
At linguistlist.org you can see, and subscribe to,
LINGUIST List, which is a mailing list that often announces conferences and job openings
that are of interest to us.
|