Difference between revisions of "Scoobi"

From Knowitall
Jump to: navigation, search
(Added info on how to run a Scoobi job.)
 
Line 1: Line 1:
== Background ==
 
 
Scoobi ([https://github.com/NICTA/scoobi Github]) is a Scala library for Hadoop. We have a bunch of Scoobi jobs in the browser-hadoop project under edu.washington.cs.knowitall.browser.hadoop.scoobi
 
Scoobi ([https://github.com/NICTA/scoobi Github]) is a Scala library for Hadoop. We have a bunch of Scoobi jobs in the browser-hadoop project under edu.washington.cs.knowitall.browser.hadoop.scoobi
  

Revision as of 20:27, 17 January 2013

Scoobi (Github) is a Scala library for Hadoop. We have a bunch of Scoobi jobs in the browser-hadoop project under edu.washington.cs.knowitall.browser.hadoop.scoobi

Running

To run a Scoobi job, set the main class in the browser-hadoop pom.xml, and compile it using mvn clean compile assembly:single. Then, you can test the job locally by running java -jar myjob.jar [args]. Or, you can run it on Hadoop using a command like this: hadoop jar myjob.jar -Dmapred.task.timeout=1200000 -Dmapred.child.java.opts=-Xmx4G [args] -- scoobi nolibjars

If you're getting an error like java.lang.ClassNotFoundException: com.nicta.scoobi.impl.exec.MscrMapper, it's probably because you forgot to add -- scoobi nolibjars to the end of the command.