The Extreme And Subtle Cost Of Task Switching

In all development teams, consistently delivering valuable work is a struggle. In data-focused teams, doubly so. The why can be subtle, and change from team to team.

But one of the major temptations - and therefore common pitfalls - is too many tasks in progress at once.

Tasks & in progress

We all have an intuitive feeling for what “task” and “in progress” mean, but it’s worth getting specific. For this purpose:

A task is any work whose completion is a net win. You’re better off than you were when you started.

In progress is everything between starting and finishing. Paused the work? It’s still in progress by this definition.

The subtle cost

It’s easy to believe that switching from one task to another should keep everyone fully occupied doing the most important work - and surely that’s going to mean better results, right?

In reality, it introduces a series of costs; each benign enough on its own, but dangerous in the aggregate.

The cost of switching from deep focus on one task to another is something all engineers have experienced. The incomplete task takes up space in your mind as attention residue, and getting focused on the new task can take a while.

This mental capacity is often your team’s most precious resource, not hours on the clock.

Communication overhead is also a natural consequence; the whole team wants to be up to speed with what’s happening. More tasks in progress either means that the depth of cross-pollination is missing, or it becomes a time drag to keep everyone in the know.

Most work creates mess by increasing complexity before it delivers value.

An S3 bucket created to house data might be a necessary step, but once it’s there, it adds to the number of things you’re managing. Your team’s life has become more marginally more difficult; one straw has been added to the proverbial camel.

At the end of the work - when you’ve ingested useful data there & granted access to a team of data scientists who are churning out useful insights - the added complexity is worth it.

pausing-and-returning is nowhere near as efficient than focusing down one effort until it’s done. Previous context goes stale, and your team’s short-term memory won’t help much after a few weeks.

It’s often better to record a set of clear ‘pause’ notes or a screen cast - but that’s clearly a cost you otherwise wouldn’t need to pay.

Finally, your team’s product owner and delivery lead need to pay a management overhead for each task in flight. More stakeholders to keep happy, progress reports to make, blockers to tackle and moving parts to organise means less time making the critical judgements and process improvements they’re there to make.

Causes and defences

Urgent work & breakages can sideline your team. If every engineer is occupied on a task already, you surely have to take them off what they’re doing to face down the new challenge?

If this keeps happening, you need more slack in your system. Consider having one engineer off ticket work to deal with the unexpected, & keep mentally fresh to deal with blockages. If you constantly have everyone on a ticket, your team’s velocity will always be fragile to external shocks.

Alternatively, question how urgent this work is, and whether it can be picked up after one of your current tasks is complete.

Friction cost really hampers data teams, because so many tasks take a long time. If this slow-down causes your teammates to “pick up a ticket on the side”, then you’re open to all of the above costs. Some of this is inherent - you can’t do the work without some waiting around. That much, you have to accept as the cost of doing business.

But so many other slow-downs are optional. If poorly optimised queries or ETLs regularly hamper progress, schedule in a task to address it. If test suites take forever, or are flaky, it’s worth reviewing if they really are testing the important things, or can otherwise be optimised.

Sometimes the friction comes from tools owned by external teams - e.g. build servers from a platform team. At times like these, I like to make a (polite) ruckus, and get my teammates to second my experiences.

limited autonomy can cause tickets to be paused (and alternatives picked up “while we wait”). There tends to be two causes:

  • you or your teammates don’t have the confidence to make the change without the ‘nod’ from someone more experienced. Spending some time pairing old hands with junior team members can go a long way to giving them this confidence.
  • your team doesn’t have the permission, in which case it’s worth finding out if there’s any way to negotiate more autonomy. In site reliability engineering, for instance, teams are given huge freedom so long as they can demonstrate they’re doing the right thing. What’d your organisation be open to?

The external team hocket sees two collaborating teams sending partially-done tasks back and forth, each considering it ‘paused’ until the other side has done their part. A pull request, a few rounds of QA or a heavily choreographed sequence of releases can all lead to this. It feels efficient, but leads to most of the same costs as above.

If a lot of this is going on, I’d consider if the tasks in progress could be lowered by:

  • decoupling is there any way the two teams can genuinely do all their work without being blocked by the other?
  • waiting to pair between teams. Yes, it can seem ridiculous, but getting the work through in two days rather than back-and-forth for 2 weeks means a clearer environment & mental palette to focus on the task at hand.

If the number of tasks in progress is invisible, then your team will naturally fail to keep a handle on it. This happens for teams that use kanban for tickets that represent sub-tasks rather proper tasks. If that’s you, find a way to make the number of tasks in progress visible, somewhere your team looks all the time.

Finally, make sure your designs are as incremental as possible. Lots of stakeholders (and some teammates!) will be tempted to argue you should bundle in an extra feature to a release. Doing this results in a bigger tasks, which are much more prone to derailing.

If your plans need to change (and they almost always need to, don’t they?), you often have to pause an over-stuffed task rather than finish it off.