The details of Automatic Text Summarization task (or Text
Summarization Challenge) are explained below.
Please keep in mind that the pages may not be final in its content since we
need to finalize some of the details later on. The additions and updates
will be announced in this page, so please check it from time to time.
1) Three (3) subtasks in TSC task for a participating
system
2) Evaluation methods for each subtask
3) Input and Output formats for each subtask
4) Details of the newspaper data used for TSC
5) Schedule, including dry run
Participants may take part in one or more of the three subtasks. You will be asked later which subtask(s) you will participate in.
You should submit the results of your summarization system according to the above specification. Summarization rate will be more than one. There are two types of summaries in the subtask,
You should submit summaries where important sentences specified
by the summarization rate are marked.
Summarization rate is given as a ratio between the number of
chosen sentences and the total number of the sentences in the
article. The maximum number of sentences you should choose will
be given for each article. If a submitted summary marked sentences
more than the maximum number, we recognize only the sentences from
the beginning of the article to the sentence of the maximum number
for the evaluation.
E.g. if the max number is five, but your system result marked seven
sentences, we recognize only the first five marked sentences counting
from the beginning of the article, and discard the rest
(last two sentences in this example) for the evaluation.
A sentence is a string of characters surrounded by <SENTENCE></SENTENCE> inserted by the tagging tool that will be distributed by TSC (see section 3)).
You should submit summaries in plain texts.
Summarization rate is a rate between the number of characters in
the summary and the total number of characters in the original article.
The maximum number of characters for the summary is given for each
article. If a submitted summary marked characters more than the
maximum number, we recognize only the characters from
the beginning of the article to the character of the maximum number
for the evaluation.
E.g. if the max number is fifty, but your system result marked seventy
characters, we recognize only the first fifty marked characters counting
from the beginning of the article, and discard the rest
(last twenty characters in this example) for the evaluation.
Please note that carriage return is not counted as a character. And we will check if the submitted results are indeed in plain text first, then we evaluate them.
Please also note that since as long as the result is in plain
text and within the max number in terms of the marked characters,
it is acceptable, the result format can be the same as Subtask A-1
removing
Given queries and retrieved documents based on the queries, you submit summaries. The length of the summaries is not limited, however, the summaries should be in plain text. We will check if the submitted results are indeed in plain text first, then we evaluate them.
You should make one summary for each document (not a summary from multiple documents). The retrieved documents may include irrelevant documents to the queries.
We use summaries prepared by human for evaluation.
For Subtask A-1, we use the following measures.
For Subtask A-2 the evaluation is not as automatic as A-1 above. We compare the system results with human-prepared free summaries as follows, and we will inform the participants of the evaluation results, and they will be shown at the NTCIR-2 workshop.
Morphological analysis will be done to the system results and human summaries, and only content words (keitaiso) will be selected. Then, the distance between the word-frequency vector of human summary and system result will be computed, and we see how close the two summaries based the content words. Please refer to the following paper for the details.
@inproceedings{donaway:00:a, author = "Donaway, R.L., Drummey, K.W. and Mather, L.A.", title = "A Comparision of Rankings Produced by Summarization Evaluation Measures", pages = "69--78", booktitle = "Proc. of the ANLP/NAACL2000 Workshop on Automatic Summarization", year = 2000 }
For A-2-2, we ask human judges who are experienced in producing summaries to evaluate and rank the system summaries in terms of
Subtask B.
Evaluation is evaluation based on information retrieval task.
Human subjects are given queries and summaries of the retrieved
documents. They read the summaries and judge how relevant the
documents are. The evaluation method is basically the same as
SUMMAC. The measures for evaluation are recall, precision and
F-measures which indicate how long it takes to carry out the subtask,
how well the subtask is done.
This evaluation method aims at the situation where you can judge relevant documents by reading their summaries, and irrelevant documents by reading their summaries.
For the details of the evaluation method, please refer to http://www.itl.nist.gov/div894/894.02/related_projects/tipster_summac/index.htmlParticipants should use mai2sgml.pl provided by IREX and use the data transformed for IREX IR task. The additional information to the data after the transformation can be used in any way by the participants. However, such information as keywords which exists only in original data should not be used.
Also, TSC provides tscsgml.pl which modifies the output of mai2sgml.pl. Thus the participants in A-1 (extraction of important sentences) use the data which is the output of tscsgml.pl (which again uses the output of mai2sgml.pl). For the participants of the other subtasks, it is up to you whether you use tscsgml.pl or not. (We will ask you which part of information you used later in questionnaire)
The data format produced by tscsgml.pl is the one in which the following
tags for TSC are added in
Please be aware that
As described in 3-1-1), the text format is the data transformed by
mai2sgml.pl. In addition, you can use the data which is the output of
tscsgml.pl which takes the output of mai2sgml.pl.
TSC provides each participant in Subtask A-1 with their own ID number and
the following data.
The participants should submit their results in the following formats.
Please bear in mind that a legitimate sentence is given by tscsgml.pl
(TSC-distributed tagging tool), and if the participants should use their own
unit, the scorer may not work correctly.
As described in 3-1-1), the text format is the data transformed by
mai2sgml.pl. In addition, you can use the data which is the output of
tscsgml.pl which takes the output of mai2sgml.pl.
The participants should submit their results in the following formats.
If the summary has more than the specified number of characters, the characters
beginning from the top to the
specified number are recognized for the evaluation.
Additional note:
As described in 3-1-1), the text format is the data transformed by
mai2sgml.pl. In addition, you can use the data, which is the output of
tscsgml.pl which takes the output of mai2sgml.pl.
TSC provides each participant in Subtask B with their own ID number and the
following data.
In task description ver.2000.09.04, NEG tags which are negative
expressions are added.
In <NARRATIVE></NARRATIVE> tags,
negative expressions are specified by being enclosed with
<NEG></NEG> tags.
The participants should submit their results in the following formats.
Summary text in <SUMTEXT> tags should be in plain text. Thus, an
example like below is not accepted.
Additional note:
We use newspaper data (the Mainichi newspaper database) of 1994, 1995, 1998.
<PARAGRAPH></PARAGRAPH> ... paragraph
<SENTENCE></SENTENCE> ... sentence
==BNF===
file :=doc*
doc :=<DOC>doc-contents</DOC>
doc-contents :=doc-id section ae words headline text*
doc-id :=<DOCNO>number</DOCNO>
{9 digit number (unique to each article)}
section :=<SECTION>space information</SECTION>
{2 byte character e.g. 「1面」}
ae :=<AE>有|無</AE>
{pictures, figures, 有 for including
無 for not including}
words :=<WORDS>number</WORDS>
{number of characters}
headline :=<HEADLINE>EUC string</HEADLINE>
text :=<TEXT>paragraph*</TEXT>
paragraph :=<PARAGRAPH>sentece*</PARAGRAPH>
sentence :=<SENTENCE>EUC string</SENTENCE>
{tag for defining a sentence. a string of characters
surrounded by the tags is a legitimate sentence}
3-1-2) Bugs in newspaper database
there are identical ID numbers in data for August 23,24 of 1995,
and we do not include the articles for the evaluation.
As for the data for 1998, we will check them by the evaluation,
And if there is any bugs, we exclude such data.
3-2) Subtask A-1 (Extraction of important sentences)
3-2-1) Text Format
3-2-2) Data and its format provided by TSC to participants
==BNF==
file :=doc*
doc :=<DOC>doc-contents</DOC>
doc-contents :=doc-id number-of-sens*
doc-id :=<DOCNO>number</DOCNO>
{document ID for summarization task}
num-of-sens :=<SUMLENGTH-S>number</SUMLENGTH-S>
{the number of sentences to be marked for summary}
E.g.
<DOC>
<DOCNO>980101002</DOCNO>
<SUMLENGTH-S>10</SUMLENGTH-S>
<SUMLENGTH-S>15</SUMLENGTH-S>
....
</DOC>
<DOC>
<DOCNO>950102002</DOCNO>
<SUMLENGTH-S>13</SUMLENGTH-S>
<SUMLENGTH-S>19</SUMLENGTH-S>
..
</DOC>
...
3-2-3) Formats for submission (from participants to TSC)
==BNF==
file :=system-id sum-result*
system-id :=<SYSTEM-ID>number</SYSTEM-ID>
{Participant ID provided by TSC}
sum-result :=<SUM-RESULT>doc-id num-of-sens sum-sentence*</SUM-RESULT>
doc-id :=<DOCNO>number</DOCNO>
num-of-sens :=<SUMLENGTH-S>number</SUMLENGTH-S>
sum-sentence :=<SENTENCE>EUC string</SENTENCE>
{output produced by the tool distributed by TSC}
E.g.
<SYSTEM-ID>01010001</SYSTEM-ID>
<SUM-RESULT>
<DOCNO>980101002</DOCNO>
<SUMLENGTH-S>10</SUMLENGTH-S>
<SENTENCE>TSCという,テキスト自動要約の新しい試みが始まった.</SENTECNE>
<SENTENCE>TSCでは,現在参加者を募っている.</SENTECNE>
...
</SUM-RESULT>
<SUM-RESULT>
<DOCNO>980101002</DOCNO>
<SUMLENGTH-S>13</SUMLENGTH-S>
..
3-2-4) Other notes
3-3) Subtask A-2 (summaries to be compared with human-prepared
summaries)
3-3-1) Text format
3-3-2) Data and its format provided by TSC to participants
TSC provides each participant in Subtask A-2 with their own ID number and
the following data.
==BNF==
file :=doc*
doc :=<DOC>doc-contents</DOC>
doc-contents :=doc-id sum-length*
doc-id :=<DOCNO>number</DOCNO>
{document ID for summarization task}
sum-length :=<SUMLENGTH-C>number</SUMLENGTH-C>
{the number of characters for summary, not including carriage return}
E.g.
<DOC>
<DOCNO>980101002</DOCNO>
<SUMLENGTH-C>150</SUMLENGTH-C>
<SUMLENGTH-C>300</SUMLENGTH-C>
....
</DOC>
<DOC>
<DOCNO>950102002</DOCNO>
<SUMLENGTH-C>120</SUMLENGTH-C>
<SUMLENGTH-C>230</SUMLENGTH-C>
...
</DOC>
3-3-3) Format for submission (from participant to TSC)
==BNF==
file :=system-id sum-result*
system-id :=<SYSTEM-ID>number</SYSTEM-ID>
{Participant ID provided by TSC}
sum-result :=<SUM-RESULT>doc-id sum-length sum-text</SUM-RESULT>
doc-id :=<DOCNO>number</DOCNO>
sum-length :=<SUMLENGTH-C>number</SUMLENGTH-C>
sum-text :=<SUMTEXT>EUC string</SUMTEXT>
{summary in plain text, the number of characters should be less than or
equal to the specified number.}
E.g.
<SYSTEM-ID>01020001</SYSTEM-ID>
<SUM-RESULT>
<DOCNO>980101002</DOCNO>
<SUMLENGTH-C>150</SUMLENGTH-C>
<SUMTEXT>TSCという,テキスト自動要約の新しい試みが始まり,現在
参加者を募っている.TSCが開催されることにより,日本におけるテキ
スト自動要約技術の一層の発展が期待されている.</SUMTEXT>
</SUM-RESULT>
<SUM-RESULT>
<DOCNO>980101002</DOCNO>
<SUMLENGTH-C>300</SUMLENGTH-C>
...
3-3-4) Other notes
Summary text in <SUMTEXT> tags should be in plain text. Thus, an
example like below is not accepted.
<SUMTEXT><FONT COLOR=AA0022>TSC</FONT>という ... </SUMTEXT>
We remove wrong tags from the submitted results by a tool, and then evaluate
the results.
Since the summary is in plain text, and participants in A-1 can submit their
results as long as the requirement for the length of the summary is
satisfied, they may re-use the results for Subtask A-2 by removing
<SENTENCE></SENTENCE> tags.
The following symbols are allowed to be used:
a white space, carriage return, and special symbols such as `…'
Except for carriage return, they are counted as character for computing
the number of characters.
3-4) Subtask B (summary for IR task)
3-4-1) Text format
3-4-2) Data and its format provided by TSC to participants
Data format of NARRATIVE is changed as follows,
version 2000.08.30:
narrative := <NARRATIVE>EUC-string</NARRATIVE>
{narrative of query}
version 2000.09.04:
narrative := <NARRATIVE>EUC-string[<NEG>EUC-string</NEG>]*</NARRATIVE>
{narrative of query, NEG tags express negative expressions}
==BNF==
file := topic*
topic := <TOPIC>topic-contents</TOPIC>
topic-contents := topic-id description narrative ir-result
topic-id := <TOPIC-ID>number</TOPIC-ID>
{query ID number}
description := <DESCRIPTION>EUC string</DESCRIPTION>
{simple description of query}
narrative := <NARRATIVE>EUC-string[<NEG>EUC-string</NEG>]*</NARRATIVE>
{narrative of query, NEG tags express negative expressions}
ir-result :=<IR-RESULT>doc-id*</IR-RESULT>
doc-id :=<DOCNO>number</DOCNO>
{retrieved document ID number, documents for Subtask B}
E.g.
<TOPIC>
<TOPIC-ID>0001</TOPIC-ID>
<DESCRIPTION>自動要約研究の新しい試み</DESCRIPTION>
<NARRATIVE>記事には,テキスト自動要約研究の新しい試みについて述べ
られており,..............(略)</NARRATIVE>
<IR-RESULT>
<DOCNO>980101002</DOCNO>
<DOCNO>950101008</DOCNO>
...
</IR-RESULT>
</TOPIC>
3-4-3) Format for submission (from participant to TSC)
==BNF==
file :=system-id topic*
system-id :=<SYSTEM-ID>number</SYSTEM-ID>
{Participant ID provided by TSC}
topic :=<TOPIC>topic-id sum-result*</TOPIC>
topic-id := <TOPIC-ID>number</TOPIC-ID>
sum-result :=<SUM-RESULT>doc-id sum-text</SUM-RESULT>
doc-id :=<DOCNO>number</DOCNO>
sum-text :=<SUMTEXT>EUC string</SUMTEXT>
{summary in plain text}
E.g.
<SYSTEM-ID>02010001</SYSTEM-ID>
<TOPIC>
<TOPIC-ID>0001</TOPIC-ID>
<SUM-RESULT>
<DOCNO>980101002</DOCNO>
<SUMTEXT>TSCという,テキスト自動要約の新しい試みが始まった.現在
参加者を募っている.TSCが開催されることにより,日本におけるテキ
スト自動要約技術の一層の発展が期待されている.</SUMTEXT>
</SUM-RESULT>
<SUM-RESULT>
<DOCNO>950101008</DOCNO>
...
</SUM-RESULT>
</TOPIC>
3-4-4) Other notes
<SUMTEXT><FONT COLOR=AA0022>TSC</FONT>という... </SUMTEXT>
We remove wrong tags from the submitted results by a tool, and then evaluate
the results.
We do not specify summarization rate for Subtask B.
The following symbols are allowed to be used:
a white space, carriage return, and special symbols such as `…'
4) Newspaper data used for TSC
If you have not obtained them yet, please contact the Mainichi and purchase
the license to use the database by yourself (if you have any questions
regarding how to obtain the data, please contact the chairs.)
We use only data of 1994 and 1995 for our Dry run.
5) Schedule
Dryrun:
August 20-25 : We ask you which subtask(s) you are interested
in taking part in the dry run
September 4 : Dry run details for each task are announced
September 8 : Submission deadline for the all the subtasks
September 30 : Evaluation will be informed
Formal Run: November 9-15 : We ask you which subtask(s) you are interested in taking part in the evaluation November 27 : Evaluation details for each task are announced December 1 : Submission deadline for the all the subtasks (23:59 in Japan time) December 27 : Evaluation will be informed