1. 首页
  2. 人工智能
  3. 论文/代码
  4. ToTTo:受控的表到文本生成数据集

ToTTo:受控的表到文本生成数据集

上传者: 2021-01-22 01:20:04上传 .PDF文件 655.49 KB 热度 9次

我们介绍了ToTTo,这是一个开放域的英语表到文本数据集,其中包含12万多个训练示例,提出了受控的生成任务:给定Wikipedia表和一组突出显示的表格单元格,生成一个句子的描述。为了获得自然而又忠实于源表的生成目标,我们引入了一个数据集构建过程,其中注释者直接从Wikipedia修改现有的候选句子。..

ToTTo: A Controlled Table-To-Text Generation Dataset

We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia.We present systematic analyses of our dataset and annotation process as well as results achieved by several state-of-the-art baselines. While usually fluent, existing methods often hallucinate phrases that are not supported by the table, suggesting that this dataset can serve as a useful research benchmark for high-precision conditional text generation.

下载地址
用户评论