Solr

From Knowitall
Jump to: navigation, search

Open IE demo

To create your own instance of the Open IE demo, you will need to do three things

  1. Copy the Solr installation and start the new instances
  2. Import the data
  3. Run the Open IE demo

Additional information can be found on the SolrCloud wiki page.

Copy the Solr installation

  1. Copy the folder rv-n16:/scratch/usr/schmmd/disk1 to your own directory, e.g., rv-n15:/scratch/usr/jstn/disk1.
  2. Create 3 more folders on the same machine: /scratch/usr/jstn/disk2, /scratch2/usr/jstn/disk3, /scratch2/usr/jstn/disk4
  3. Check if there's already a Solr instance running on that machine by going to http://rv-n15.cs.washington.edu:8983/solr. If there is, then you need to change the port number. To do so:
    1. Find an unused port number by going to http://rv-n15.cs.washington.edu:8985/solr, and seeing if you get anything. We'll use 8985 as our example throughout.
    2. Open distribute.sh, and change the first screen command and add this arg to the java command: -Djetty.port=8985.
    3. Change the curl commands to point to the new port.
    4. In the for loop, look for the screen command, and change -DzkHost=localhost:9985. It's not clear what the correct number should be, but it's likely that it's just 99xx, where xx is the last two numbers of your new port. To find out for sure:
      1. Open up the screen session for the master: screen -r solr
      2. Go into vim navigation mode by pressing Ctrl-A ESC
      3. Type 'gg' to go to the top of the page, and then /99 to search for the string "99". Somewhere it should say what the ZooKeeper port is (e.g., 9985).
  4. In distribute.sh, modify the servers line to point to the folders you created.
  5. Run distribute.sh. It will take a bit of time, as it copies the solr folder to each directory. It also starts screen sessions for each Solr instance.
  6. Check that it's working by going to http://rv-n15.cs.washington.edu:8983/solr, changing the port if necessary. You should see the Solr dashboard. Click "Cloud" on the left and check that shard1 is on rv-n15:8983 (or rv-n15:8985 if you changed the port), and the other shards are on rv-n15:7002, rv-n15:7003, and rv-n15:7004.

Import the data

You now need to import the data. This will take a day or two, as it needs to be indexed.

  1. git clone the openie-backend project.
  2. Run sbt 'project openie-populator' 'run-main edu.knowitall.browser.solr.SolrLoader http://rv-n15.cs.washington.edu:8983/solr stdin' < myfile, adjusting the args as necessary.
  3. Commit the data using curl http://localhost:8983/solr/update --data '<commit/>' -H 'Content-type:text/xml; charset=utf-8', adjusting args as necessary.

Run the demo

  1. Checkout the solr branch of openie-demo (if it hasn't been merged).
  2. In openie-demo, edit the file app/controllers/Executor.scala, and change the HttpSolrServer constructor param to be the url of your Solr instance.
  3. Run sbt run
  4. Visit your site at http://rv-n15.cs.washington.edu:9000