Additional explanation for Formal run evaluation
(Last updated on Dec. 28, 2000)

Evaluation methods for each subtask have been described in Task Description, however, we add here the following as additional explanation with regard to the Formal run evaluation. We used the almost the same evaluation methods as the Dryrun evaluation.

Subtask A-1

No additional explanation.

Subtask A-2 (content-based evaluation)

With human-produced and system-produced summaries, they are morphologically analyzed by Juman, and only the content words are extracted. Then the distance is computed between word-frequency vector of human summary and word-frequency vector of system summary, and we use it to see how close the summaries are in terms of their content words.

<Conditions>

We have two kinds of human-produced summaries for Subtask A-2.

  1. Freely summarized texts
  2. Summaries produced by selecting important parts of the sentences in the text

Both kinds used for the Formal run.

Subtask A-2 (Subjective evaluation)

The following four kinds of summaries as well as the original texts are prepared.

  1. Summaries produced by selecting important parts of the sentences in the text
  2. Freely summarized texts
  3. Summaries produced by a system
  4. Summaries produced by using tf-based method

First, the evaluator (one person) reads the original text and its summaries (4 kinds). Then, evaluate and score them in terms of how readable they are, and how well the content of the text is described in the summary. The scores are one of 1, 2, 3, and 4 where 1 is the best and 4 is the worst, i.e. the lower the score, the better the evaluation is.

Subtask B


Co-chairs of the Text Summarization Task
Manabu Okumura : oku@pi.titech.ac.jp
Takahiro Fukusima : fukusima@res.otemon.ac.jp
complain, advice to tsc-admin@recall.jaist.ac.jp