Additional explanation for Dryrun evaluation
(Last updated on Oct. 30, 2000)

Evaluation methods for each subtask have been described in Task Description, however, we add here the following as additional explanation with regard to the Dryrun evaluation conducted in September. We will use the almost the same evaluation methods at the formal run evaluation.

Subtask A-1

No additional explanation.

Subtask A-2 (content-based evaluation)

With human-produced and system-produced summaries, they are morphologically analyzed by Juman, and only the content words are extracted. Then the distance is computed between word-frequency vector of human summary and word-frequency vector of system summary, and we use it to see how close the summaries are in terms of their content words.

<Conditions>

We have two kinds of human-produced summaries for Subtask A-2.

  1. Freely summarized texts
  2. Summaries produced by selecting important parts of the sentences in the text
The content-based evaluation at the Dryrun is based on the comparison with the latter.

Both kinds will be used for the formal run.

Subtask A-2 (Subjective evaluation)

The following four kinds of summaries as well as the original texts are prepared.

  1. Summaries produced by selecting important parts of the sentences in the text
  2. Freely summarized texts
  3. Summaries produced by a system
  4. Summaries produced by using lead method

First, the evaluator (one person) reads the original text and its summaries (4 kinds). Then, evaluate and score them in terms of how readable they are, and how well the content of the text is described in the summary. The scores are one of 1, 2, 3, and 4 where 1 is the best and 4 is the worst, i.e. the lower the score, the better the evaluation is.

Subtask B


Co-chairs of the Text Summarization Task
Manabu Okumura : oku@pi.titech.ac.jp
Takahiro Fukusima : fukusima@res.otemon.ac.jp
complain, advice to tsc-admin@recall.jaist.ac.jp