The Graduate Certificate in Natural Language Processing (NLP) consists of 9 units.
The following two courses are required:
The elective slot can be filled with any course listed electives below.
For students with little background looking for a gentler introduction, we recommend the following sequence:
This course introduces the key concepts underlying statistical natural language processing. Students will learn a variety of techniques for the computational modeling of natural language, including: n-gram models, smoothing, Hidden Markov models, Bayesian Inference, Expectation Maximization, Viterbi, Inside-Outside Algorithm for Probabilistic Context-Free Grammars, and higher-order language models. Graduate-level requirements include assignments of greater scope than undergraduate assignments. In addition to being more in depth, graduate assignments are typically longer and additional readings are required.
This course focuses on statistical approaches to pattern classification and applications of natural language processing to real-world problems.
This class serves as an introduction to human language technology (HLT), an emerging interdisciplinary field that encompasses most subdisciplines of linguistics, as well as computational linguistics, natural language processing, computer science, artificial intelligence, psychology, philosophy, mathematics, and statistics. Content includes a combination of theoretical and applied topics such as (but not limited to) tokenization across languages, ngrams, word representations, basic probability theory, introductory programming, and version control.
This intermediate-level course is a continuation of LING 529 and covers a combination of theoretical and applied topics such as (but not limited to) unsupervised learning (clustering), decision tree classifiers, and the basics of information retrieval.
Fundamentals of formal language theory; syntactic and semantic processing; the place of world knowledge in natural language processing. Graduate-level requirements include a greater number of assignments and a higher level of performance.
This course provides a hands-on project-based approach to particular problems and issues in computational linguistics.
Topics include speech synthesis, speech recognition, and other speech technologies. This course gives students background for a career in the speech technology industry. Graduate students will do extra readings, extra assignments, and have an extra presentation. Their final project must constitute original work in a speech technology.
Students are introduced to computer programming as it pertains to collecting and analyzing linguistic data. The particular programming language is chosen at the discretion of the instructor. Graduate-level requirements include more challenging exams; 50% greater contribution to their respective group projects; 9 instead of 6 assignments; additional readings from the primary literature.
NOTE: The version offered in the online program is closer to something like “advanced programming techniques for computational linguists”
The development and exchange of scholarly information, usually in a small group setting with an in-depth investigation of computational linguistics theory and application. The scope of work shall consist of research by course registrants, with the exchange of the results of such research through discussion, reports, and/or papers.
NOTE: The topic/focus may vary from offering to offering. The next online offering will be structured as a followup to the Speech Technology course (LING 578).
This course will introduce students to the concepts and techniques of data mining for knowledge discovery. It includes methods developed in the fields of statistics, large-scale data analytics, machine learning, pattern recognition, database technology and artificial intelligence for automatic or semi-automatic analysis of large quantities of data to extract previously unknown interesting patterns. Topics include understanding varieties of data, data preprocessing, classification, association and correlation rule analysis, cluster analysis, outlier detection, and data mining trends and research frontiers. We will use software packages for data mining, explaining the underlying algorithms and their use and limitations. The course include laboratory exercises, with data mining case studies using data from many different resources such as social networks, linguistics, geo-spatial applications, marketing and/or psychology
This course focuses on the use of modern data science methods to help learners make socially responsible decisions and mitigate harm that arises from issues like bias, discrimination, and threats to one’s personal privacy. More and more individuals are needing to make data-driven decisions in a wide variety of contexts including non-governmental organizations, not-for-profit industries, human services, environmental organizations, refugee camps, and more. Students in this class will thus learn about data science and how it can be utilized in contexts where socially-good decisions are desired and emphasized. This active learning class is designed for students who have an interest in the topic but who may have little to no previous experience with data science or programming.
Most of the web data today consists of unstructured text. Of course, the fact that this data exists is irrelevant, unless it is made available such that users can quickly find information that is relevant for their needs. This course will cover the fundamental knowledge necessary to build these systems, such as web crawling, index construction and compression, Boolean, vector-based, and probabilistic retrieval models, text classification and clustering, link analysis algorithms such as PageRank, learning to rank, and computational advertising. The students will also complete one programming project, in which they will construct one complex application that combines multiple algorithms into a system that solves real-world problems.
This course covers important algorithms useful for natural language processing (NLP), including distributional similarity algorithms such as word embeddings, recurrent and recursive neural networks (NN), probabilistic graphical models useful for sequence prediction, and parsing algorithms such as shift-reduce. This course will focus on the algorithms that underlie NLP, rather than the application of NLP to various problem domains.
Neural networks are a branch of machine learning that combines a large number of simple computational units to allow computers to learn from and generalize over complex patterns in data. Students in this course will learn how to train and optimize feed forward, convolutional, and recurrent neural networks for tasks such as text classification, image recognition, and game playing.