Scheduling
Updated Jan 01, 2022
By default, Hadoop uses FIFO to schedule jobs. Alternate scheduler options: capacity and fair
Capacity Scheduler
- Jobs are submitted to queues
- Jobs can be prioritized
-
Queues are allocated a fraction of the total resource capacity
-
Free resources are allocated to queues beyond their total capacity
-
No preemption once a job is running
Fair scheduler
- Provides fast response times for small jobs
- Jobs are grouped into Pools
- Each pool assigned a guaranteed minimum share
- Excess capacity split between jobs
-
By default, jobs that are uncategorized go into a default pool.
-
Pools have to specify the minimum number of map slots, reduce slots, and a limit on the number of running job