模型测试用的语料库
* PPDB
http://www.cis.upenn.edu/~ccb/ppdb/
* MSRP:Microsoft Reasearch paraphrase
* WRPA:wikiPedia
* P4P: paraphrase for plagiarism detection
* Quora Question Pairs Dataset
* SemEval-2015
* SICK data:sentences involving compositiona knowledge
1. PPDB
* 主要是词汇的格式
LHS ||| SOURCE ||| TARGET ||| (FEATURE=VALUE )* ||| ALIGNMENT
2. MSPR
3. P4P
http://www.uni-weimar.de/medien/webis/events/pan-14/pan14-web/plagiarism-detection.html
- 防剽窃语料
4. SemEval -2015
5 quroa的数据(kaggle)
https://www.kaggle.com/c/quora-question-pairs/data