Difference between revisions of "Datasets"

From Knowitall
Jump to: navigation, search
 
(2 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
== Corpora ==
 
== Corpora ==
 
* [http://googleresearch.blogspot.com/2013/05/syntactic-ngrams-over-time.html Syntactic Ngrams over Time] (2013)
 
* [http://googleresearch.blogspot.com/2013/05/syntactic-ngrams-over-time.html Syntactic Ngrams over Time] (2013)
 +
* [http://www.yelp.com/dataset_challenge/ Yelp Dataset Challenge]: sample of Yelp data from Phoenix, AZ (2013)
 
* [http://storage.googleapis.com/books/ngrams/books/datasetsv2.html Google N-grams] N-grams from a large corpus of books (2010)
 
* [http://storage.googleapis.com/books/ngrams/books/datasetsv2.html Google N-grams] N-grams from a large corpus of books (2010)
  
 
== Knowledge bases ==
 
== Knowledge bases ==
 
* [https://developers.google.com/freebase/data Freebase Data Dumps]: all Freebase triples, updated weekly, as well as a one-time dump of triples that were removed (2013)
 
* [https://developers.google.com/freebase/data Freebase Data Dumps]: all Freebase triples, updated weekly, as well as a one-time dump of triples that were removed (2013)
* [http://googleresearch.blogspot.com/2013/05/distributing-edit-history-of-wikipedia.html Distributing the Edit History of Wikipedia Infoboxes]: edit history of Wikipedia info boxes.
+
* [http://googleresearch.blogspot.com/2013/05/distributing-edit-history-of-wikipedia.html Distributing the Edit History of Wikipedia Infoboxes]: edit history of Wikipedia info boxes (2013)
 +
* [http://wiki.dbpedia.org/Datasets DBpedia Dataset]
  
 
== Entities ==
 
== Entities ==
Line 19: Line 21:
 
== Paraphrasing ==
 
== Paraphrasing ==
 
* [http://www.cs.jhu.edu/~ccb/ppdb/ PPDB: The Paraphrase Database]: paraphrases obtained from multi-lingual corpora (2013)
 
* [http://www.cs.jhu.edu/~ccb/ppdb/ PPDB: The Paraphrase Database]: paraphrases obtained from multi-lingual corpora (2013)
 +
 +
== Other resources ==
 +
* [http://research.microsoft.com/apps/catalog/default.aspx?t=downloads Microsoft Research Downloads]
 +
* [http://www.cse.unt.edu/~rada/downloads.html Rada Milhalcea's page]

Latest revision as of 03:25, 18 July 2013

Below is a list of potentially useful NLP datasets.

Corpora

Knowledge bases

Entities

Relations

Paraphrasing

Other resources