Hadoop

From Knowitall
Revision as of 17:42, 28 November 2011 by Schmmd (talk | contribs) (Created page with "= Guidelines = We are using the FairScheduler with one pool per user. Additionally, there is a pool for background tasks. This pool has a weighting of 0. Jobs in this pool wi...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Guidelines

We are using the FairScheduler with one pool per user. Additionally, there is a pool for background tasks. This pool has a weighting of 0. Jobs in this pool will never start if other jobs are running.

If there are problems with scheduling, first check the scheduler page: http://rv-n11.cs.washington.edu:50030/scheduler. Second, you can make a hot configuration to conf/fairscheduler.xml, such as adding a minMaps for a pool. These changes take effect in seconds. Go ahead and make changes if you need to, but please keep Michael informed (send schmmd@cs.washington.edu an email). Otherwise it's too confusing...

A number of people have suggested characteristics for good jobs. The following defines a "nice" job.

  1. Mappers finish in more than 60 seconds (preferably some minutes) and less than an hour (preferably much less).
  2. There are no more than 9 reducers.
  3. The entire job finishes within 48 hours.

You can reduce the number of mappers by using "mapred.min.split.size" and you can increase the number of mappers by decreasing the block size of your file (dfs.block.size). When we upgrade (next quarter) we can specify per-pool reducer limits.

Jobs should not be run if they have any of the following characteristics. These are "pathological" jobs.

  1. Mappers finish in less than 30 seconds or more than 4 hours.
  2. There are more than 18 reducers. Exception: more reducers are allowable if they finish quickly (all reducers take less than 10 min using all slots).
  3. The job will take more than 7 days.

If you need to run a pathological job, you need to email the users of the cluster and explain why you are running such a job and buy the group a pitcher of beer. If you see a pathological job that was not explained, you should contact the owner and ask them to kill their job. If 30 minutes pass and you have not heard back, you may kill the job.

We will not allow people to use the Hadoop cluster if they are under 21.