Performance Optimization I: Work Distribution and Scheduling

Previous | Next --- Slide 43 of 64

Claire

When stealing from another work queue, is the overhead of acquiring a lock and switching threads to do the stolen work less than if we were to dynamically allocate the tasks? Because I was thinking that if a thread was done and wanted to steal some more work, it seems like it would be costly to go to each other thread and check if they have more work and steal versus just checking a common queue if there was any total work left.

haiyuem

Another question that might be related: If there are multiple threads ongoing, how does a new thread know where to steal from?

a7hu

@Claire I don't know the all the answer to your questions. Let me give it a try

Consider the case of using dynamic assignment: - All 3 threads are out of work, and they need to fetch work form the common queue - The common queue is protected by a single mutex Then the three threads will take turn to enter the common queue protected by the single mutex. Assume at t0 all three threads are out of work and entering and modifying the critical section needs x time Then thread 0 will start running at t0 + x thread 1 will start running at t0 + 2x thread 2 will start running at t0 + 3x

Now consider the case of using per thread dequeue - All 3 threads are out of work, and they need to fetch/steal work from dequeue1 - Read access to dequeue1 is protected by a read lock - Modifying the top of the dequeue1 is protected by a write lock The read lock may be held simultaneously by multiple reader threads, so long as there are no writers. The write lock is exclusive. When thread 0 is out of work, it dequeues from the bottom of the work queue and requires no mutex since thread 0 is the only one accessing the bottom of the work queue Thread 1 and 2 acquires the read lock of thread 0 and decided to steal work from thread 0 Assume at t0 all three threads are out of work, write lock requires x, read lock requires y (y < x) Then thread 0 will start running at t0 thread 1 will start running at t0 + y + x thread 2 will start running at t0 + y + 2x

Therefore, stealing from thread 0 work queue has less contention than from the common queue

Please log in to leave a comment.