TSC TASK DESCRIPTION
[Japanese] [TSC Home]
NTCIR-2 Automatic Text Summarization Task/ TSC: Text Summarization Challenge
(Last updated on Oct. 23, 2000.):ver.20001023

changes from ver.2000.09.04; in 3-3-4 and 3-4-4, we supplement about output format of summary.
changes from ver.2000.08.30; in 3-4-2) negative expressions are specified by being enclosed with <NEG></NEG> tags in <NARRATIVE></NARRATIVE> tags.
changes from ver.2000.07.27; in the description of BNF about file formats, number in <DOCNO>number</DOCNO> was not 8 digit number, but 9 digit number. So we revised them.

The details of Automatic Text Summarization task (or Text Summarization Challenge) are explained below. Please keep in mind that the pages may not be final in its content since we need to finalize some of the details later on. The additions and updates will be announced in this page, so please check it from time to time.
1) Three (3) subtasks in TSC task for a participating system
2) Evaluation methods for each subtask
3) Input and Output formats for each subtask
4) Details of the newspaper data used for TSC
5) Schedule, including dry run


1) Three subtasks of TSC task

Participants may take part in one or more of the three subtasks. You will be asked later which subtask(s) you will participate in.

Subtask A.

In this subtask, you are given :

You should submit the results of your summarization system according to the above specification. Summarization rate will be more than one. There are two types of summaries in the subtask,

"A-1" for extracting important sentences,
"A-2" for producing summaries to be compared with human-prepared summaries
(human-prepared summaries here means "free" summaries where human annotators summarized the articles freely without worrying about keeping original sentences)

Subtask A-1 (Extraction of Important Sentences)

You should submit summaries where important sentences specified by the summarization rate are marked.
Summarization rate is given as a ratio between the number of chosen sentences and the total number of the sentences in the article. The maximum number of sentences you should choose will be given for each article. If a submitted summary marked sentences more than the maximum number, we recognize only the sentences from the beginning of the article to the sentence of the maximum number for the evaluation.
E.g. if the max number is five, but your system result marked seven sentences, we recognize only the first five marked sentences counting from the beginning of the article, and discard the rest (last two sentences in this example) for the evaluation.

A sentence is a string of characters surrounded by <SENTENCE></SENTENCE> inserted by the tagging tool that will be distributed by TSC (see section 3)).

Subtask A-2 (Summaries to be compared with human-prepared summaries)

You should submit summaries in plain texts. Summarization rate is a rate between the number of characters in the summary and the total number of characters in the original article. The maximum number of characters for the summary is given for each article. If a submitted summary marked characters more than the maximum number, we recognize only the characters from the beginning of the article to the character of the maximum number for the evaluation.
E.g. if the max number is fifty, but your system result marked seventy characters, we recognize only the first fifty marked characters counting from the beginning of the article, and discard the rest (last twenty characters in this example) for the evaluation.

Please note that carriage return is not counted as a character. And we will check if the submitted results are indeed in plain text first, then we evaluate them.

Please also note that since as long as the result is in plain text and within the max number in terms of the marked characters, it is acceptable, the result format can be the same as Subtask A-1 removing tags from your A-1 results.

Subtask B (summaries for IR task)

Given queries and retrieved documents based on the queries, you submit summaries. The length of the summaries is not limited, however, the summaries should be in plain text. We will check if the submitted results are indeed in plain text first, then we evaluate them.

You should make one summary for each document (not a summary from multiple documents). The retrieved documents may include irrelevant documents to the queries.

2) Evaluation methods for each subtask

Please see also,
Additional explanation for Dryrun evaluation page.

- Intrinsic evaluation

We use summaries prepared by human for evaluation. For Subtask A-1, we use the following measures.

Recall = the number of correct sentences marked by the system / the total number of correct sentences marked by human
Precision = the number of correct sentences marked by the system / the total number of sentences marked by the system
F-measures = 2 x Recall x Precision / (Recall + Precision)

After calculating these scores for each article, we compute the average of them and make it its final score.

For Subtask A-2 the evaluation is not as automatic as A-1 above. We compare the system results with human-prepared free summaries as follows, and we will inform the participants of the evaluation results, and they will be shown at the NTCIR-2 workshop.

A-2-1.

Morphological analysis will be done to the system results and human summaries, and only content words (keitaiso) will be selected. Then, the distance between the word-frequency vector of human summary and system result will be computed, and we see how close the two summaries based the content words. Please refer to the following paper for the details.

@inproceedings{donaway:00:a,
  author = "Donaway, R.L., Drummey, K.W. and Mather, L.A.",
  title = "A Comparision of Rankings Produced by Summarization
  Evaluation Measures",
  pages = "69--78",
  booktitle = "Proc. of the ANLP/NAACL2000 Workshop on Automatic Summarization",
  year = 2000
}

A-2-2.

For A-2-2, we ask human judges who are experienced in producing summaries to evaluate and rank the system summaries in terms of

- how much the system summary covers the important content of the original article.
- how readable the system summary is.

- Extrinsic evaluation

Subtask B.
Evaluation is evaluation based on information retrieval task. Human subjects are given queries and summaries of the retrieved documents. They read the summaries and judge how relevant the documents are. The evaluation method is basically the same as SUMMAC. The measures for evaluation are recall, precision and F-measures which indicate how long it takes to carry out the subtask, how well the subtask is done.

Recall = the number of documents judged relevant correctly by human subjects / the total number of relevant documents
Precision = the number of documents judged relevant correctly by human subjects / the total number of documents judged relevant by subjects
F-measures = 2 x Recall x Precision / (Recall + Precision)

This evaluation method aims at the situation where you can judge relevant documents by reading their summaries, and irrelevant documents by reading their summaries.

For the details of the evaluation method, please refer to http://www.itl.nist.gov/div894/894.02/related_projects/tipster_summac/index.html

3) Input and Output formats for each subtask

3-1) Common parts for all the subtasks

3-1-1) Text format

Participants should use mai2sgml.pl provided by IREX and use the data transformed for IREX IR task. The additional information to the data after the transformation can be used in any way by the participants. However, such information as keywords which exists only in original data should not be used.

Also, TSC provides tscsgml.pl which modifies the output of mai2sgml.pl. Thus the participants in A-1 (extraction of important sentences) use the data which is the output of tscsgml.pl (which again uses the output of mai2sgml.pl). For the participants of the other subtasks, it is up to you whether you use tscsgml.pl or not.

(We will ask you which part of information you used later in questionnaire)

The data format produced by tscsgml.pl is the one in which the following tags for TSC are added in tags which are part of the output of mai2sgml.pl (data format for IREX IR).

<PARAGRAPH></PARAGRAPH> ... paragraph
<SENTENCE></SENTENCE>   ... sentence
==BNF===

file		:=doc*

doc		:=<DOC>doc-contents</DOC>

doc-contents	:=doc-id section ae words headline text*

doc-id		:=<DOCNO>number</DOCNO>
{9 digit number (unique to each article)}

section		:=<SECTION>space information</SECTION>
{2 byte character e.g. 「1面」}
	
ae		:=<AE>有|無</AE>
{pictures, figures, 有 for including
                    無 for not including}

words		:=<WORDS>number</WORDS>
{number of characters}

headline	:=<HEADLINE>EUC string</HEADLINE>

text		:=<TEXT>paragraph*</TEXT>

paragraph	:=<PARAGRAPH>sentece*</PARAGRAPH>

sentence	:=<SENTENCE>EUC string</SENTENCE>
{tag for defining a sentence. a string of characters
 surrounded by the tags is a legitimate sentence}

3-1-2) Bugs in newspaper database

Please be aware that
there are identical ID numbers in data for August 23,24 of 1995, and we do not include the articles for the evaluation.
As for the data for 1998, we will check them by the evaluation, And if there is any bugs, we exclude such data.

3-2) Subtask A-1 (Extraction of important sentences)

3-2-1) Text Format

As described in 3-1-1), the text format is the data transformed by mai2sgml.pl. In addition, you can use the data which is the output of tscsgml.pl which takes the output of mai2sgml.pl.

3-2-2) Data and its format provided by TSC to participants

TSC provides each participant in Subtask A-1 with their own ID number and the following data.

==BNF==

file		:=doc* 

doc		:=<DOC>doc-contents</DOC>

doc-contents	:=doc-id number-of-sens*

doc-id		:=<DOCNO>number</DOCNO>
{document ID for summarization task}

num-of-sens	:=<SUMLENGTH-S>number</SUMLENGTH-S>
{the number of sentences to be marked for summary}

E.g.
	<DOC>
	<DOCNO>980101002</DOCNO>
	<SUMLENGTH-S>10</SUMLENGTH-S>
	<SUMLENGTH-S>15</SUMLENGTH-S>
	....
	</DOC>
	<DOC>
	<DOCNO>950102002</DOCNO>
	<SUMLENGTH-S>13</SUMLENGTH-S>
	<SUMLENGTH-S>19</SUMLENGTH-S>
	..
	</DOC>
	...

3-2-3) Formats for submission (from participants to TSC)

The participants should submit their results in the following formats.

==BNF==

file		:=system-id sum-result*

system-id	:=<SYSTEM-ID>number</SYSTEM-ID>
{Participant ID provided by TSC}

sum-result	:=<SUM-RESULT>doc-id num-of-sens sum-sentence*</SUM-RESULT>

doc-id		:=<DOCNO>number</DOCNO>

num-of-sens	:=<SUMLENGTH-S>number</SUMLENGTH-S>

sum-sentence	:=<SENTENCE>EUC string</SENTENCE>
{output produced by the tool distributed by TSC}

E.g.
	<SYSTEM-ID>01010001</SYSTEM-ID>	
	<SUM-RESULT>
	<DOCNO>980101002</DOCNO>
	<SUMLENGTH-S>10</SUMLENGTH-S>
	<SENTENCE>TSCという,テキスト自動要約の新しい試みが始まった.</SENTECNE>
	<SENTENCE>TSCでは,現在参加者を募っている.</SENTECNE>
	...
	</SUM-RESULT>
	<SUM-RESULT>
	<DOCNO>980101002</DOCNO>
	<SUMLENGTH-S>13</SUMLENGTH-S>
	..

3-2-4) Other notes

Please bear in mind that a legitimate sentence is given by tscsgml.pl (TSC-distributed tagging tool), and if the participants should use their own unit, the scorer may not work correctly.

3-3) Subtask A-2 (summaries to be compared with human-prepared summaries)

3-3-1) Text format

As described in 3-1-1), the text format is the data transformed by mai2sgml.pl. In addition, you can use the data which is the output of tscsgml.pl which takes the output of mai2sgml.pl.

3-3-2) Data and its format provided by TSC to participants

TSC provides each participant in Subtask A-2 with their own ID number and the following data.
==BNF==

file		:=doc* 

doc		:=<DOC>doc-contents</DOC>

doc-contents	:=doc-id sum-length*

doc-id		:=<DOCNO>number</DOCNO>
{document ID for summarization task}

sum-length	:=<SUMLENGTH-C>number</SUMLENGTH-C>
{the number of characters for summary, not including carriage return}

E.g.
	<DOC>
	<DOCNO>980101002</DOCNO>
	<SUMLENGTH-C>150</SUMLENGTH-C>
	<SUMLENGTH-C>300</SUMLENGTH-C>
	....
	</DOC>
	<DOC>
	<DOCNO>950102002</DOCNO>
	<SUMLENGTH-C>120</SUMLENGTH-C>
	<SUMLENGTH-C>230</SUMLENGTH-C>
	...
	</DOC>

3-3-3) Format for submission (from participant to TSC)

The participants should submit their results in the following formats.

==BNF==

file		:=system-id sum-result*

system-id	:=<SYSTEM-ID>number</SYSTEM-ID>
{Participant ID provided by TSC}

sum-result	:=<SUM-RESULT>doc-id sum-length sum-text</SUM-RESULT>

doc-id		:=<DOCNO>number</DOCNO>

sum-length	:=<SUMLENGTH-C>number</SUMLENGTH-C>

sum-text	:=<SUMTEXT>EUC string</SUMTEXT>
{summary in plain text, the number of characters should be less than or
equal to the specified number.}

E.g.
	<SYSTEM-ID>01020001</SYSTEM-ID>	
	<SUM-RESULT>
	<DOCNO>980101002</DOCNO>
	<SUMLENGTH-C>150</SUMLENGTH-C>
	<SUMTEXT>TSCという,テキスト自動要約の新しい試みが始まり,現在
	参加者を募っている.TSCが開催されることにより,日本におけるテキ
	スト自動要約技術の一層の発展が期待されている.</SUMTEXT>
	</SUM-RESULT>
	<SUM-RESULT>
	<DOCNO>980101002</DOCNO>
	<SUMLENGTH-C>300</SUMLENGTH-C>
	...

3-3-4) Other notes

If the summary has more than the specified number of characters, the characters beginning from the top to the specified number are recognized for the evaluation.

Summary text in <SUMTEXT> tags should be in plain text. Thus, an example like below is not accepted.

<SUMTEXT><FONT COLOR=AA0022>TSC</FONT>という ... </SUMTEXT>

We remove wrong tags from the submitted results by a tool, and then evaluate the results.

Since the summary is in plain text, and participants in A-1 can submit their results as long as the requirement for the length of the summary is satisfied, they may re-use the results for Subtask A-2 by removing <SENTENCE></SENTENCE> tags.

Additional note:
The following symbols are allowed to be used: a white space, carriage return, and special symbols such as `…' Except for carriage return, they are counted as character for computing the number of characters.

3-4) Subtask B (summary for IR task)

3-4-1) Text format

As described in 3-1-1), the text format is the data transformed by mai2sgml.pl. In addition, you can use the data, which is the output of tscsgml.pl which takes the output of mai2sgml.pl.

3-4-2) Data and its format provided by TSC to participants

TSC provides each participant in Subtask B with their own ID number and the following data.

In task description ver.2000.09.04, NEG tags which are negative expressions are added. In <NARRATIVE></NARRATIVE> tags, negative expressions are specified by being enclosed with <NEG></NEG> tags.
Data format of NARRATIVE is changed as follows,

version 2000.08.30:
narrative      	:= <NARRATIVE>EUC-string</NARRATIVE>
{narrative of query}

version 2000.09.04:
narrative := <NARRATIVE>EUC-string[<NEG>EUC-string</NEG>]*</NARRATIVE>
{narrative of query, NEG tags express negative expressions}
==BNF==

file           	:= topic*

topic          	:= <TOPIC>topic-contents</TOPIC>

topic-contents 	:= topic-id description narrative ir-result

topic-id       	:= <TOPIC-ID>number</TOPIC-ID>
{query ID number}

description    	:= <DESCRIPTION>EUC string</DESCRIPTION>
{simple description of query}

narrative := <NARRATIVE>EUC-string[<NEG>EUC-string</NEG>]*</NARRATIVE>
{narrative of query, NEG tags express negative expressions}

ir-result	:=<IR-RESULT>doc-id*</IR-RESULT>

doc-id		:=<DOCNO>number</DOCNO>
{retrieved document ID number, documents for Subtask B}

E.g.
	<TOPIC>
	<TOPIC-ID>0001</TOPIC-ID>
	<DESCRIPTION>自動要約研究の新しい試み</DESCRIPTION>
	<NARRATIVE>記事には,テキスト自動要約研究の新しい試みについて述べ
	られており,..............(略)</NARRATIVE>
	<IR-RESULT>
	<DOCNO>980101002</DOCNO>
	<DOCNO>950101008</DOCNO>
	...
	</IR-RESULT>
	</TOPIC>

3-4-3) Format for submission (from participant to TSC)

The participants should submit their results in the following formats.

==BNF==

file		:=system-id topic*

system-id	:=<SYSTEM-ID>number</SYSTEM-ID>
{Participant ID provided by TSC}

topic		:=<TOPIC>topic-id sum-result*</TOPIC>

topic-id       	:= <TOPIC-ID>number</TOPIC-ID>

sum-result	:=<SUM-RESULT>doc-id sum-text</SUM-RESULT>

doc-id		:=<DOCNO>number</DOCNO>

sum-text	:=<SUMTEXT>EUC string</SUMTEXT>
{summary in plain text}

E.g.
	<SYSTEM-ID>02010001</SYSTEM-ID>	
	<TOPIC>
	<TOPIC-ID>0001</TOPIC-ID>
	<SUM-RESULT>
	<DOCNO>980101002</DOCNO>
	<SUMTEXT>TSCという,テキスト自動要約の新しい試みが始まった.現在
	参加者を募っている.TSCが開催されることにより,日本におけるテキ
	スト自動要約技術の一層の発展が期待されている.</SUMTEXT>
	</SUM-RESULT>
	<SUM-RESULT>
	<DOCNO>950101008</DOCNO>
	...
	</SUM-RESULT>
	</TOPIC>

3-4-4) Other notes

Summary text in <SUMTEXT> tags should be in plain text. Thus, an example like below is not accepted.

<SUMTEXT><FONT COLOR=AA0022>TSC</FONT>という... </SUMTEXT>

We remove wrong tags from the submitted results by a tool, and then evaluate the results.

We do not specify summarization rate for Subtask B.

Additional note:
The following symbols are allowed to be used: a white space, carriage return, and special symbols such as `…'

4) Newspaper data used for TSC

We use newspaper data (the Mainichi newspaper database) of 1994, 1995, 1998.
If you have not obtained them yet, please contact the Mainichi and purchase the license to use the database by yourself (if you have any questions regarding how to obtain the data, please contact the chairs.)
We use only data of 1994 and 1995 for our Dry run.

5) Schedule

Dryrun:
August 20-25 : We ask you which subtask(s) you are interested 
               in taking part in the dry run
September 4  : Dry run details for each task are announced
September 8  : Submission deadline for the all the subtasks
September 30 : Evaluation will be informed

Formal Run:
November 9-15 : We ask you which subtask(s) you are interested 
               in taking part in the evaluation
November 27  : Evaluation details for each task are announced
December 1  : Submission deadline for the all the subtasks 
               (23:59 in Japan time)
December 27  : Evaluation will be informed


Co-chairs of the Text Summarization Task
Manabu Okumura : oku@pi.titech.ac.jp
Takahiro Fukusima : fukusima@res.otemon.ac.jp
complain, advice to tsc-admin@recall.jaist.ac.jp