Datasets
From Knowitall
Revision as of 02:58, 18 July 2013 by Jstn (talk | contribs) (Created page with "Below is a list of potentially useful NLP datasets. == Facts == * [https://developers.google.com/freebase/data Freebase Data Dumps] All Freebase triples, updated weekly, as well...")
Below is a list of potentially useful NLP datasets.
Contents
Facts
- Freebase Data Dumps All Freebase triples, updated weekly, as well as a one-time dump of triples that were removed (2013)
Entities
- Freebase Annotations of the ClueWeb Corpora: entity-linked ClueWeb09 and ClueWeb12 (2013)
- Learning from Big Data: 40 Million Entities in Context: named entities with document context (2013)
Relations
- 50,000 Lessons on How to Read: a Relation Extraction Corpus: human-annotated judgments for "place of birth" and "attended an institution" (2013)
Paraphrasing
- PPDB: The Paraphrase Database: paraphrases obtained from multi-lingual corpora (2013)