This one is more of a diary blog, rather than criticising anyone in particular. It’s an example of the tricky and infuriating side of being a software developer.
There was a bug logged a few months back that we didn’t pay attention to, but it’s progressively got more complaints and there are a bunch of users complaining about it on a Facebook group, so I think it got attention from the CEO.
Like usual, the bug comes to me to fix, and it seemed the expectation was that I’d have it done in a few days.
Users basically have an Inbox with a list of tasks in there. Like an email application they can see the tasks on the main-view but can also double-click to open them. On this open-view, they can click buttons to deal with the task, but they can also scroll to new tasks rather than closing.
On the open-view, users were clicking Approve but nothing happened. If they clicked it again, then it was successfully approved. If you looked in the audit, it did show two approvals. This bug didn’t occur if you opened the task directly from the main-view, then Approved. It was only when you switched to another task on the open-view.
I could easily recreate it, but when I was debugging, I couldn’t make sense of what was going on. It seemed to have the correct data, it updated the correct property, then all of a sudden, it seemed wrong again. The code in that area was massively confusing and it was hard to work out how things should work.
After days of looking at it, I managed to persuade one of my work friends (and maybe one of the smartest developers) to come help me out. He couldn’t understand the problem, but came up with theories of caching problems. I didn’t see anything obvious in the code I saw, but I was out of ideas, so we changed the config to disable all the caches. Didn’t make a difference.
The next day, I told my manager that there’s no progress, and we probably need to get more nerds involved. So I managed to get a couple of hours with the smartest developer at the company.
He couldn’t understand it either. He was convinced that after the data is saved to the database, something on another Thread was changing the data in the cache, so that the object on the client doesn’t match the server cache.
The next day, we were losing hope. I could only get a couple of hours with the smartest developers but even they were confused, and now I was on my own again.
I looked at some recent changes to the files in question, and I saw someone had reduced the amount of “refreshes” on these tasks. It was previously trying to reload the data a few times every time you selected a task. I reverted the changes to see if it made a difference…and it fixed the issue.
The thing is, I wasn’t going to just revert the changes, because the performance issues would be back. So then I was trying to work out why that had a bearing on the cache. I persuaded a Senior developer on my team to also look at it because I was convinced I was close, but by this point, it was hard to think because I had spent 5 days on this same bug. I had been doing more than my 9-5, I was basically doing 8-6, then sometimes 10pm-12am. There must be something I am overlooking.
It took him a couple of days but he did point me in the right direction. It seemed we were basically caching a data object in a variable, retrieving the data again on the server which changed the object references in the server cache. Then when you click Approve, it updates a different object reference, and now they are out of sync.
So I basically removed the cached variable, and only used the data once the server cache had been updated.
A simple fix in the end, but it took all week.