Scoobi

From Knowitall
Revision as of 20:27, 17 January 2013 by Jstn (talk | contribs)

Jump to: navigation, search

Scoobi (Github) is a Scala library for Hadoop. We have a bunch of Scoobi jobs in the browser-hadoop project under edu.washington.cs.knowitall.browser.hadoop.scoobi

Running

To run a Scoobi job, set the main class in the browser-hadoop pom.xml, and compile it using mvn clean compile assembly:single. Then, you can test the job locally by running java -jar myjob.jar [args]. Or, you can run it on Hadoop using a command like this: hadoop jar myjob.jar -Dmapred.task.timeout=1200000 -Dmapred.child.java.opts=-Xmx4G [args] -- scoobi nolibjars

If you're getting an error like java.lang.ClassNotFoundException: com.nicta.scoobi.impl.exec.MscrMapper, it's probably because you forgot to add -- scoobi nolibjars to the end of the command.