Performance Tales: Tasks

Recently, a team was created in order to tackle major issues, often proactively. Some errors are logged without the user noticing anything, so by monitoring the logs, the team can diagnose and fix these “silent” errors. The other thing they are looking out for is performance problems. Some performance problems aren’t noticed when it’s more of a minor increase, inconsistent, or slowly gets worse over time. I would think some users don’t bother reporting slowness because it’s harder to quantify than the obvious crash.

However, one user had been seeing a recent drop in performance by not dealing with their tasks, but it had got to the point where they could no longer log in, as it took longer than 30 seconds to retrieve their tasks on login – so it timed-out (as in an error was meant to be thrown when the time to retrieve their tasks took 30 seconds).

“At the time of logging this bug, the user currently has 136,854 tasks in Tasks Management. The Program performance will start to be negatively affected after 4,000 tasks. I have extended the timeout of sql call for TasksManagement.GetUserTaskCountSummary to 60 seconds as this caused a login failure.”

Walter (Developer)

“let’s be honest, the program performance will start to be negatively affected after 1 task”

Mike (jestingly)

I think this is acceptable as a quick fix to allow the user to login again, but is it really acceptable for the login process to take more than 30 seconds? I’d imagine it would take around 40 seconds for this user.

That’s the problem with this team, they just look for quick wins, even if it isn’t the right solution long-term, and might even move the problem elsewhere.

What’s better than waiting 30 seconds? waiting 60 seconds? or threading it off to delay the loading? or Mark’s idea: no timeout.

What if the time taken still exceeds 60 seconds, assuming some other user has a large volume of data? Can you set the timeout as infinity?.

Mark


we had a customer with a very large count and it only ran for around 32/33 seconds and their counts were unusually large. We are going to send out communications for customers to keep these low by means of maintenance. The 60 seconds just allows an extra safety net if we get in this situation again. I don’t want to extend the timeout to be too long for this reason as it will unlikely (if ever) be needed to be longer than 60 seconds.

Walter


Why not a try catch / retry attempt for this? It should be a non-essential call to logging in, if it fails, you can catch, log, and show an error message. Should we not look at optimising this so that you can login quicker?
Maybe run this on a background thread too?

Lee


I discussed this with Johnny and making changes to this Stored Procedure could result in worse performance for smaller datasets and advised against making changes to it. We’re going to tackle this by means of communications to sites. I thought the simplest and safest approach is just to extend the timeout slightly so that the practice does not suffer a system down as a result of it – the timeout is only breached by a second or two.
Once the user logs in, they are displayed their task counts, so I think it might be deemed essential (rather than showing them a loading wheel until the data is returned). Currently, if we did this, when loading up Tasks Management it would just crash with an error.

Walter


It would still crash on logging in if it takes over 60 seconds.
Why not make it non-critical to logging in?

  • Log in
  • Status bar “Loading…”
  • Completes – OK.
  • Fails – Show error and retry link.
Lee


This was the worst site on the estate and was taking roughly 32 seconds. To take over 60 seconds the task count would probably be unheard of.
Each time I ran that stored procedure manually, the results were the same so I don’t think a retry is going to work here.
Even by changing to make it non critical to logging in, Tasks Management will still be busted when you try to load it. The timeout is on the SQL side so that is the area we need to resolve really.

Walter

However, Johnny did advise against alternative solutions such as:
1. fine tuning the stored proc
2. adding indexes
3. Remove tasks counts completely for some types of tasks

My View:

Walter seemed to have put more thought into it than I originally thought, but I still thought he was overlooking Lee’s suggestion. Yes, it would need more work to actually work (display loading text on the Tasks Count Bar, then loading screen when launching the Tasks Management page), but it would significantly speed up logging in. If this user could log in 32 seconds quicker, then what would the average user see?

If the other parts of the log-in process also take some time, then that’s a long time they are waiting in total. If taskcounts are the bulk of the time, then we can make it super fast if we take it out. I would have thought users would expect times of 5 seconds or less (might not be possible, but that’s the scale we need to aim for). Walter is talking like users are more than happy to wait 30 seconds or more just to get to the home page. A long wait is better than not being able to log-in at all, but surely it’s generally unacceptable to be more than several seconds in total. It’s one of the reasons why users have grown more discontent over time.

When doing some testing of smaller counts, for example 10k – the results are returned in a few seconds (2-3). This organisation had around 120k Appointments Tasks across all users plus all of their other tasks which resulted in a production duration of 32 seconds. The more they manage their tasks the quicker workflow will be, that’s always been the message we’ve tried to get across.

Walter

Leave a comment