ToTTo:受控的表到文本生成数据集
我们介绍了ToTTo,这是一个开放域的英语表到文本数据集,其中包含12万多个训练示例,提出了受控的生成任务:给定Wikipedia表和一组突出显示的表格单元格,生成一个句子的描述。为了获得自然而又忠实于源表的生成目标,我们引入了一个数据集构建过程,其中注释者直接从Wikipedia修改现有的候选句子。..
ToTTo: A Controlled Table-To-Text Generation Dataset
We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia.We present systematic analyses of our dataset and annotation process as well as results achieved by several state-of-the-art baselines. While usually fluent, existing methods often hallucinate phrases that are not supported by the table, suggesting that this dataset can serve as a useful research benchmark for high-precision conditional text generation.