計算言語学（Computational Linguistics）

2014年度前期木曜日3-4限(10:45-12:15)
講義室：G311

Schedule

	Dates	Topics	Assignment
1	April 10	Introduction to this lecture. Tagging with HMM. slides slides	- install Python into your laptop computer. - learn basics of Python if you are a novice. - read a note on HMM by Michael Collins. - implement your HMM-based POS tagger.
2	April 17	Text classification with naive bayes classifiers slides	- read "a comparison of event models for naive bayes text classification" by McCallum and Nigam. - install MeCab, and try to find sentences that MeCab cannot analyze correctly
3	April 24	The method of Lagrange multipliers. Maximum likelihood estimation. Maximum a posteriori estimation.slides	- read Sections 1 and 2 of the following tutorial on Lagrange multipliers. tutorial by Dan Klein. Try to give an intuitive explanation of this method when the solution space is 3-dimensional. - implement a naive bayes classifier. Train it on this file, and test it on this file. Each line of these files consists of a class (+1 or -1) and a segmented sentence.
4	May 1	Maximum likelihood estimation. Maximum a posteriori estimation. bag-of-words representation of document. SVM. slides	- MAP estimation of multinomial model of naive bayes classifiers - derive the dual problem of the optimization problem of SVM with soft-margin - Use an SVM tool (e.g., TinySVM) to train a model on this file, and test it on this file. You need to write a script that converts those files into input format of the tool. - Read Section 2.1 of this tutorial.
--	May 8	NO LECTURE
5	May 15	Named-Entity Extraction Dependency Analysis slides	- read Section 3 and Section 5.1 of the "CaboCha" paper": "Japanese Dependency Analysis using Cascaded Chunking", CoNLL 2002. and answer the following questions: -- which static features are used? -- which dynamic features are used? -- are dynamic features effective? If so, in what situation? -- which kernel function is used? -- what benefit does the use of the kernel function above have? (not written in the paper. Think for yourself)
6	May 22	Log-linear Model Conditional Random Fields (CRF) slides	- read Sections 1, 2, and 3 of the tutorial on CRF - read Section 6.3 of a book (in Japanese) to review CRF and try to understand the forward-backward algorithm.
7	May 29	Forward-backward algorithm Text summarization slides	Read the following paper and learn how the weights on words are calculated in their work: Yih et al., 2007
8	June 5	text summarization slides	take a rest
9	June 12	k-means clustering, EM, PLSI slides	- derive the update equations for the product model. - Answer the following questions with the reference to Hofmann's paper. * how is ``document'' integrated into the model? * what is the tempered EM? What is the update equation for PLSI when the tempered EM is used? * what is the folding-in? What kind of calculation is needed for the folding-in? - Implement PLSI, and train it on this file, and calculate the perplexity of this file.
10	June 19	LDA slides	implement Gibbs Sampling for LDA. Train it on this file. Each line of this file corresponds to a document, which is represented as a set of nouns, verbs, adverbs, and adjectives that appear in the document.
--	June 26	NO LECTURE
11	July 3	Check LDA code. slides,	No assignment. But see the slides for details on the report submission (GRADING 1).
12	July 10	Derivation of update equations for LDA's Gibbs Sampling. Sentiment analysis. slides, survey by Kaji-san	Watch this video (10 minutes). GRADING 2: Read the submission, write the review form, and send it to me by July 23rd?
13	July 17	Linguistic resources, Conference presentations slides,
14	July 24	NO LECTURE

Grading

Basically, grading will be based the following two things:
1. code for LDA: you are to write and submit a code of LDA (due on July 16).
2. review of a research paper: you are to read a research paper and write its review (due on July 23).

Back to my top page

高村大也 (TAKAMURA, Hiroya)
〒226-8503 神奈川県横浜市緑区長津田町4259, Mail-box R2-7
東京工業大学精密工学研究所
phone & fax 045-924-5295

計算言語学（Computational Linguistics）

2014年度前期木曜日3-4限(10:45-12:15) 講義室：G311

Schedule

Grading

2014年度前期木曜日3-4限(10:45-12:15)
講義室：G311