Difference between revisions of "Datasets"

From Knowitall
Jump to: navigation, search
(Created page with "Below is a list of potentially useful NLP datasets. == Facts == * [https://developers.google.com/freebase/data Freebase Data Dumps] All Freebase triples, updated weekly, as well...")
 
Line 1: Line 1:
 
Below is a list of potentially useful NLP datasets.
 
Below is a list of potentially useful NLP datasets.
  
== Facts ==
+
== Corpora ==
* [https://developers.google.com/freebase/data Freebase Data Dumps] All Freebase triples, updated weekly, as well as a one-time dump of triples that were removed (2013)
+
* [http://googleresearch.blogspot.com/2013/05/syntactic-ngrams-over-time.html Syntactic Ngrams over Time] (2013)
 +
* [http://storage.googleapis.com/books/ngrams/books/datasetsv2.html Google N-grams] N-grams from a large corpus of books (2010)
 +
 
 +
== Knowledge bases ==
 +
* [https://developers.google.com/freebase/data Freebase Data Dumps]: all Freebase triples, updated weekly, as well as a one-time dump of triples that were removed (2013)
 +
* [http://googleresearch.blogspot.com/2013/05/distributing-edit-history-of-wikipedia.html Distributing the Edit History of Wikipedia Infoboxes]: edit history of Wikipedia info boxes.
  
 
== Entities ==
 
== Entities ==
 
* [http://googleresearch.blogspot.com/2013/07/11-billion-clues-in-800-million.html Freebase Annotations of the ClueWeb Corpora]: entity-linked ClueWeb09 and ClueWeb12 (2013)
 
* [http://googleresearch.blogspot.com/2013/07/11-billion-clues-in-800-million.html Freebase Annotations of the ClueWeb Corpora]: entity-linked ClueWeb09 and ClueWeb12 (2013)
 
* [http://googleresearch.blogspot.com/2013/03/learning-from-big-data-40-million.html Learning from Big Data: 40 Million Entities in Context]: named entities with document context (2013)
 
* [http://googleresearch.blogspot.com/2013/03/learning-from-big-data-40-million.html Learning from Big Data: 40 Million Entities in Context]: named entities with document context (2013)
 +
* [http://googleresearch.blogspot.com/2012/05/from-words-to-concepts-and-back.html From Words to Concepts and Back: Dictionaries for Linking Text, Entities and Ideas]: probabilistic mappings from strings to entities (2012)
  
 
== Relations ==
 
== Relations ==

Revision as of 03:10, 18 July 2013

Below is a list of potentially useful NLP datasets.

Corpora

Knowledge bases

Entities

Relations

Paraphrasing