Publikation
Generating Extended and Multilingual Summaries with Pre-trained Transformers
Rémi Calizzano; Malte Ostendorff; Qian Ruan; Georg Rehm
In: Nicoletta Calzolari; Frédéric Béchet; Philippe Blache; Christopher Cieri; Khalid Choukri; Thierry Declerck; Hitoshi Isahara; Bente Maegaard; Joseph Mariani; Jan Odijk; Stelios Piperidis (Hrsg.). Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022). International Conference on Language Resources and Evaluation (LREC-2022), Marseille, France, Pages 1640-1650, European Language Resources Association (ELRA), 6/2022.
Zusammenfassung
Almost all summarisation methods and datasets focus on a single language and short summaries. We introduce a new dataset called WikinewsSum for English, German, French, Spanish, Portuguese, Polish, and Italian summarisation tailored for extended summaries of approx. 11 sentences. The dataset comprises 39,626 summaries which are news articles from Wikinews and their sources. We compare three multilingual transformer models on the extractive summarisation task and three training scenarios on which we fine-tune mT5 to perform abstractive summarisation. This results in strong baselines for both extractive and abstractive summarisation on WikinewsSum. We also show how the combination of an extractive model with an abstractive one can be used to create extended abstractive summaries from long input documents. Finally, our results show that fine-tuning mT5 on all the languages combined significantly improves the summarisation performance on low-resource languages.