The Weeklong Bug

This one is more of a diary blog, rather than criticising anyone in particular. It’s an example of the tricky and infuriating side of being a software developer.

There was a bug logged a few months back that we didn’t pay attention to, but it’s progressively got more complaints and there are a bunch of users complaining about it on a Facebook group, so I think it got attention from the CEO.

Like usual, the bug comes to me to fix, and it seemed the expectation was that I’d have it done in a few days.

Users basically have an Inbox with a list of tasks in there. Like an email application they can see the tasks on the main-view but can also double-click to open them. On this open-view, they can click buttons to deal with the task, but they can also scroll to new tasks rather than closing.

On the open-view, users were clicking Approve but nothing happened. If they clicked it again, then it was successfully approved. If you looked in the audit, it did show two approvals. This bug didn’t occur if you opened the task directly from the main-view, then Approved. It was only when you switched to another task on the open-view.

I could easily recreate it, but when I was debugging, I couldn’t make sense of what was going on. It seemed to have the correct data, it updated the correct property, then all of a sudden, it seemed wrong again. The code in that area was massively confusing and it was hard to work out how things should work.

After days of looking at it, I managed to persuade one of my work friends (and maybe one of the smartest developers) to come help me out. He couldn’t understand the problem, but came up with theories of caching problems. I didn’t see anything obvious in the code I saw, but I was out of ideas, so we changed the config to disable all the caches. Didn’t make a difference.

The next day, I told my manager that there’s no progress, and we probably need to get more nerds involved. So I managed to get a couple of hours with the smartest developer at the company.

He couldn’t understand it either. He was convinced that after the data is saved to the database, something on another Thread was changing the data in the cache, so that the object on the client doesn’t match the server cache. 

The next day, we were losing hope. I could only get a couple of hours with the smartest developers but even they were confused, and now I was on my own again.

I looked at some recent changes to the files in question, and I saw someone had reduced the amount of “refreshes” on these tasks. It was previously trying to reload the data a few times every time you selected a task. I reverted the changes to see if it made a difference…and it fixed the issue.

The thing is, I wasn’t going to just revert the changes, because the performance issues would be back. So then I was trying to work out why that had a bearing on the cache. I persuaded a Senior developer on my team to also look at it because I was convinced I was close, but by this point, it was hard to think because I had spent 5 days on this same bug. I had been doing more than my 9-5, I was basically doing 8-6, then sometimes 10pm-12am. There must be something I am overlooking.

It took him a couple of days but he did point me in the right direction. It seemed we were basically caching a data object in a variable, retrieving the data again on the server which changed the object references in the server cache. Then when you click Approve, it updates a different object reference, and now they are out of sync.

So I basically removed the cached variable, and only used the data once the server cache had been updated.

A simple fix in the end, but it took all week.

Missing Update

I received a review from Colin which was modifying an internal tool. I have never made any changes for this tool, but I have had a nosey at other changes to have a vague idea of the approach to modifying this.

I point out that his SQL patch was missing an “update” statement. (Ideally, maybe someone should add a database trigger, because manually adding this extra “update” statement to every patch is a bit silly; but hey, that’s the current process).

He replies back asking why he needs the update. Before I had even responded, he had made the change anyway. So he has no idea, but is happy to do what I say! 

He also said “I have never made a change for this tool before”. I knew that was a lie. I could look at the ‘Completed Pull Requests’ and find 1 change that he did a few months back. I really haven’t made a change to this before, and I know that this statement needs adding in, it’s not really an excuse.

I explain that if you look at all the other patches, they all have it in. I also linked a recent review which was done by the “Module Expert” where he flagged this as a problem, and has provided a basic explanation of what it was for. This Module Expert was on holiday for the week, so we couldn’t ask him, or get him to review it; so Colin has to make do with me reviewing it.

3 hours later, I get another Pull Request from Colin. It’s a separate change, but it’s another patch for the same tool. It doesn’t have the “update” statement in.

Instead of shaming him on the Pull Request. I send him a message on Slack asking how he managed to forget. He then tells me it isn’t needed because “it is a different area”. 

No idea what he is on about. If he really thinks it isn’t needed, then why did he change it as requested in the last review? If I am wrong, then he shouldn’t have made the change, and he should have responded telling me that. Of course, if he sends a review without this expected change in, then I’m going to question it again, aren’t I?

I Want To Do Some Enhancements

Many years ago, the development team was set-up in a way that “proper” developers worked on enhancements, whereas Juniors worked on fixing bugs. To be honest, in many situations, fixing bugs is the hardest scenario.

I always like a variety, because doing a particular thing becomes tedious and isn’t giving you a good range of skills.

Julian started moaning about how he wanted to “do some enhancements”. He was itching to prove himself and craved variety.

The managers finally caved in and gave him a set of 4 enhancements to do as a mini-project over the course of a couple of months. When the deadline approached, he had left the company.

When we looked at his changes, only 2 looked even close to finished. They were riddled with bugs. One of them looked like an absolute botched implementation, and when I looked at the code and pasted it into Google, I found the Stack Overflow posts it was taken from.

Julian had just copied and pasted code, tried to change a few things here and there, but couldn’t improve it. I imagine he realised he has made a fool out of himself, so handed in his notice before we uncovered what a mess it was.

In the end, we just scrapped his code completely because it was a waste of time looking at it any further. 

The Name

I was peeking at Code Reviews and I saw this code.

retrievalUser.Surname = Formatters.ToTitleCase(usersName.Item as string);
retrievalUser.GivenName = Formatters.ToTitleCase(usersName.Item as string);

Notice how the code for both Surname and GivenName is exactly the same.

How can that possibly work? If “usersName.Item” is the full name like Alan Taylor, then surely if the UI shows GivenName and Surname, then it will be displayed as “Alan Taylor Alan Taylor” which is obviously not their name.

If this is some temporary code, then why haven’t they added a code comment next to the code to highlight that this needs replacing? Something like “Currently, the API only returns the full name so we cannot determine the users Surname and GivenName until version 2.1”.

Also, is it even correct to Title Case someone’s name? If you have a name like McDonald, wouldn’t this become Mcdonald? There’s probably loads of other examples too. I think the Dutch “van” is lowercase like “Robin van Persie”.

So there’s not much right with these 2 lines of code.

Product Names

Let’s say that the company I work for is “ACME” (classic fake company name). Our existing product is called “ACME Pro”, and our upcoming product is ACME-One. 

In the past, we have had inconsistent branding, with the company name written as ACME, Acme, and acme. We have tried to be more consistent and have discussed the importance that the new product is only ever gonna be “ACME-One” and the hyphen is very important.

Colin has been doing some work in ACME Pro, but it has some kind of integration with ACME-One. He had a database change and was populating the new table with values “Acme Pro” and “ACME One”.

I point out that the casing is wrong with the first product, and he has missed the important hyphen in the second product.

It would take him a few seconds to change it, whereas leaving it like that means it will probably be wrong for a long time and annoy pedantic/perfectionist people like me.

He replies “this isn’t user facing text, so it is not an issue”.

He really doesn’t care about quality. Also, one of the database columns is literally called DisplayName. Who is the text displayed to? If it isn’t an actual end user, it’s going to be an employee using some kind of configuration tool; and that is still a type of user. 

It’s also annoying for writing SQL queries, because you may type “where DisplayName = “ACME-One” but it won’t find it because the stored data is missing the hyphen. So you have to be aware of the misspelling in order for data to be returned.

All Nighter

After my important bug fix was complete, it needed urgently testing, so a Tester worked overtime on Friday night and Saturday.

During testing, he found a bug, but it was completely unrelated to my fix. Still, he wasn’t to know that, and so when he raised it as an issue, my manager was panicking and trying to contact  some developers.

I didn’t see the message until Saturday 11am, but when I logged on and investigated it, I saw another message from my manager to liaise with Colin. Colin loves his overtime so had jumped at the chance to get involved.

When I managed to contact Colin at around 11:45am, he sounded like he was falling asleep. He then told me he was working 10pm to 4am to work out what was happening.

To be fair, he didn’t have the advantage of knowing what the original bug was or the impact of my fix, so I assume he spent a lot of time investigating that. The thing is, the only conclusion he had was that “it wasn’t caused by your change”. I worked that out within 30 mins; I didn’t need 6 hours.

The thing is, there’s no way you can work effectively that late at night; given that you have already done 9-5, and you have it in your mind that it’s now the weekend. To get back in “work-mode” and look at code through the early hours of the morning – it’s just madness.

The thing is, I worked on an important project with Colin a couple of years back. There were major problems with the code which were often caused by Colin. I fixed them during work hours. Colin was spending a few hours extra each night in overtime fixing other problems. He would then turn up late the next day, looking completely tired, and then wrote more crap code… then had to sort it out in overtime. Then it’s an endless cycle. If you want quality code, then overtime is not the answer.

Team Summary

The team is showing unwavering commitment, innovation, resilience, and drive to deliver continuously. I’ve never seen anything like it in the years I’ve been here.

Head of Development

I always say that managers often show a massive disconnect from reality. They seem to praise when the team has performed poorly, and I think this is another case of that. I went through our team’s changes in the last month. Merges are trivial and I don’t really count them as real work, but I’ve included them in the table anyway. 

Here is the analysis:

NameMergesFixesMy View
Me14I’ve been disappointed in my productivity. However, I haven’t done that bad when compared to everyone else. Additionally, I have got loads of changes in a branch, on hold; awaiting confirmation of actual requirements. 
Senior Dev111 (mainly small items)Decent work, but really only cherry picking “quick wins”
Colin4 (made a massive mess of one of these, made a bit of a mess with another. I think the other two were primarily to teach people how to merge [ironically].)5 (I believe 2 of these fixes were issues he created in the other 3)Colin wasted a lot of time by making a mess of a merge which introduced at least one bug, and he made a mess of one of his other changes. He claims he has been more productive whilst working at home.
Colin 203 (1 of which was removing an unused using statement)With only 2 real changes, one of which was small and it didn’t work properly; this is terrible performance.
Rob02Rob has been real quiet. I think he had a bit of time off work, but still, this is poor
Beavis01Beavis was primarily just making excuses why he can’t work. To be fair, there was one week where he was doing some other work for another team.
Total626
Our “drive to deliver

So in a month, we haven’t done much work. I’d be interested to calculate what the average amount of fixes developers do each month. It’s a bit meaningless because a fix could take an hour, and another fix could take a week. However, you would think it would average out over a period of time.

I’ve never seen anything like it. I might use that as a recurring phrase from now on.
My manager actually said to me that my efforts “haven’t gone unnoticed”. I didn’t bother to correct him. I think I need to play up the delusion to try and get a promotion. Honesty never got me anywhere before.

Press F5 to launch

I had a look at my old team’s work to see what progress they have made and to reassure myself I had made the correct decision to leave that project. 

Not much has changed in months. All the issues we had haven’t been fixed. There’s errors logged to the console window, there’s loads of buttons that don’t work. There’s a few small features but nothing works properly. 

What made me laugh was a button that said “Press F5 to launch”. It’s a website. Pressing F5 isn’t going to launch a dialog; it’s going to refresh the page in the browser.

Maybe there is a way of overriding the browser’s refresh behaviour but that would be very annoying to the user.

What I don’t get is: how someone came up with that requirement in the first place, someone else implemented it and somehow didn’t realise it wasn’t going to work. Although surely they did realise when they wrote some code to do it, then pressed F5 and the page refreshes…but then checked it in anyway.

Honestly, you just can’t get the staff.

HTML Encoding

A developer wrote some code that tries to extract text that looks like HTML tags. So if you have <h1>, <div> etc in your text, he was basically just deleting the HTML prior to saving it to the database. 

The thing is, the code just looked for a “<” then a “>” and removed everything between. So if you had a sentence like “the value is ❤ and >1”, then run that through his code, the returned value would be “the value is 1” which has a completely different meaning. You could have paragraphs of text between these characters, and his code would just delete it.

Not sure why he decided to hand-craft his own solution. There will be more official ways of encoding HTML.

Just Copying What Was Already There

My manager calls me and says there is a bug that might end up holding the release, and wondered if I can help. He said it was found when testing Colin’s work, but it might not actually be related to his changes. 

Colin had been given some other work and he was taking ages on it,so wasn’t available to complete this work himself.

So I call Colin and ask him what this bug is. He sent me the description from the tester (Becky) which was next to useless, he showed me his recent changes, and a specification from his project. Colin assured me that his changes should work, because he just “copied what was already there”.

It seemed like everything I needed, so I got digging into the code and testing it out.

It didn’t take me long before I realised what Colin had done. There were two scenarios that he needed to check. By copying and pasting what he saw in that file; he was just covering one of the scenarios. The other scenario was actually in another file, and he would have realised that if he actually tested it.

The thing is, my first thought was to add a unit test to cover the scenario. I checked for existing unit tests for that file, and there were some existing tests. Colin hadn’t bothered adding additional unit tests; which would have made him realise that his changes didn’t cover both scenarios.

Then days have gone by, the manual testers have finally got round to testing out his changes, and logged this issue just days before the release. 

Then I have to save the day once more.

It’s so annoying/stressful when you get given a bug to fix with strict deadlines, but luckily it was very easy to fix. If Colin had shown a bit of effort, then we wouldn’t have been in that situation.