模型测试用的语料库

* PPDB

http://www.cis.upenn.edu/~ccb/ppdb/

* MSRP:Microsoft Reasearch paraphrase

* WRPA:wikiPedia

* P4P: paraphrase for plagiarism detection

* Quora Question Pairs Dataset

* SemEval-2015

* SICK data:sentences involving compositiona knowledge


1. PPDB

* 主要是词汇的格式

LHS ||| SOURCE ||| TARGET ||| (FEATURE=VALUE )* ||| ALIGNMENT

2. MSPR


3. P4P

http://www.uni-weimar.de/medien/webis/events/pan-14/pan14-web/plagiarism-detection.html

  • 防剽窃语料

4. SemEval -2015


5 quroa的数据(kaggle)

https://www.kaggle.com/c/quora-question-pairs/data

* 是否是相同的问题

results matching ""

    No results matching ""