Communications Fail: RCA

Another instance of poor communication:

Them: Hi

Them: (4 hours later). “Are you available? I need some clarification”

Me: “What about?”

Them: 20 mins later. “Did you fix this bug? <insert bug number>”

Me: “yes” (nervous about what is happening. Clearly it was fixed by me. I have a “Pull Request” linked against it, and I have added comments to the discussion and filled in the “Testing Details)

Them: (45 mins later) “Can you add root-cause analysis?”

5 hours and 5 minutes to tell me to do something. I had already added the information to the “discussion” section; they could have just moved it to the correct section, then informed me that there was a dedicated section I was supposed to fill in.

So frustrating.

The Weeklong Bug

This one is more of a diary blog, rather than criticising anyone in particular. It’s an example of the tricky and infuriating side of being a software developer.

There was a bug logged a few months back that we didn’t pay attention to, but it’s progressively got more complaints and there are a bunch of users complaining about it on a Facebook group, so I think it got attention from the CEO.

Like usual, the bug comes to me to fix, and it seemed the expectation was that I’d have it done in a few days.

Users basically have an Inbox with a list of tasks in there. Like an email application they can see the tasks on the main-view but can also double-click to open them. On this open-view, they can click buttons to deal with the task, but they can also scroll to new tasks rather than closing.

On the open-view, users were clicking Approve but nothing happened. If they clicked it again, then it was successfully approved. If you looked in the audit, it did show two approvals. This bug didn’t occur if you opened the task directly from the main-view, then Approved. It was only when you switched to another task on the open-view.

I could easily recreate it, but when I was debugging, I couldn’t make sense of what was going on. It seemed to have the correct data, it updated the correct property, then all of a sudden, it seemed wrong again. The code in that area was massively confusing and it was hard to work out how things should work.

After days of looking at it, I managed to persuade one of my work friends (and maybe one of the smartest developers) to come help me out. He couldn’t understand the problem, but came up with theories of caching problems. I didn’t see anything obvious in the code I saw, but I was out of ideas, so we changed the config to disable all the caches. Didn’t make a difference.

The next day, I told my manager that there’s no progress, and we probably need to get more nerds involved. So I managed to get a couple of hours with the smartest developer at the company.

He couldn’t understand it either. He was convinced that after the data is saved to the database, something on another Thread was changing the data in the cache, so that the object on the client doesn’t match the server cache. 

The next day, we were losing hope. I could only get a couple of hours with the smartest developers but even they were confused, and now I was on my own again.

I looked at some recent changes to the files in question, and I saw someone had reduced the amount of “refreshes” on these tasks. It was previously trying to reload the data a few times every time you selected a task. I reverted the changes to see if it made a difference…and it fixed the issue.

The thing is, I wasn’t going to just revert the changes, because the performance issues would be back. So then I was trying to work out why that had a bearing on the cache. I persuaded a Senior developer on my team to also look at it because I was convinced I was close, but by this point, it was hard to think because I had spent 5 days on this same bug. I had been doing more than my 9-5, I was basically doing 8-6, then sometimes 10pm-12am. There must be something I am overlooking.

It took him a couple of days but he did point me in the right direction. It seemed we were basically caching a data object in a variable, retrieving the data again on the server which changed the object references in the server cache. Then when you click Approve, it updates a different object reference, and now they are out of sync.

So I basically removed the cached variable, and only used the data once the server cache had been updated.

A simple fix in the end, but it took all week.

How To Make Your Team Hate You #2

I was thinking about the colleagues I’ve legitimately despised (and it’s not just my opinion), and most of them have something in common: they have either been promoted, or told they will get promoted if they prove themselves. So here is story number 2.

We were in a newly formed team, and Rob was a newly promoted Test Manager. As developers, we were planning on tidying up the code base, formatting the code consistently, and removing unused code.

There wasn’t any real testing to do, given the code was unused, but Rob insisted we spend time investigating how to test the code.

It didn’t make sense – how can you test some code that isn’t used? Unless you prove it somehow is used, of course. But if the developer could tell the tester how to prove it is used, then the developer would know it is used, and wouldn’t remove it.

For the actual, real, development work, Rob demanded that we come up with a full test plan for every planned piece of work before we are allowed to make any code changes.

The thing is, even though I know the overall aim and could explain some test scenarios up front, I will always suggest more tests after I make the code changes. After I have made the changes, I am familiar with other areas of the system it impacted, or extra changes I made to refactor the code. So it only would be a temporary plan anyway.

Once we made that point, Rob then said we should analyse the code, plan out what code changes we would make, including refactoring, and document them. When all items are documented, we could then start work.

Who agreed in the team? No one. Not even other testers. Yet, he then runs off to higher ranking managers and persuades them that it is a good idea. I think they were just good friends with him, so they backed him.

We ended up spending ages working out what we were gonna change, but then not actually do it; because we had to plan every item first. 

When the time came to actually implementing our plan, you then had to spend time re-familiarising with the solution and then verifying if the proposed solution was actually accurate. The person that did the planning may not be the person to actually implement it.

We ended up being 2 months into the project and not a single line of code was changed. When we started doing the work, the “familiarising” process just annoyed people. From the start, we had stated that the entire process was dum, we were annoyed at the extended planning time, then the actual development stage just reinforced how dumb the entire thing was.

I don’t know if anything happened behind the scenes, but Rob was moved off our team at this point, and it felt like a huge weight off our shoulders.

I don’t know what Rob aimed to achieve with his idea. I think he thought it was a process that enforced quality, but he was unanimously told it was a bad idea, but yet fought to get it implemented. Surely if the team is fighting against it, then dictating that they implement it is just going to cause so much friction – that your position is going to be untenable. And it was.

The lesson here is that if you do get promoted, make sure it doesn’t go to your head. Yeah, you have to manage people, but having a new fancy job title doesn’t mean you have to go wild. You are working with people. You are still working with your old colleagues. Treat them with respect and be more understanding. I think a lot of managers gain respect by being nice and leading by example.

More work

A couple of months ago, I did tell my manager that we don’t have enough developers in the team. I end up with a bit of backlog of things to do, but I also need to dedicate time to train up my Apprentice. Quite often, we end up with conversations like this:

Manager: “Hey, what are you working on?”

Me: “Such a fix, after that, I need to return to the last bit of work I parked.”

Manager: “I have another task for you.”

What made me laugh is that the last time this conversation occurred: halfway through the day, he then tags me in a conversation about a bug and he says “this is tomorrow’s work for you”. So he gave me 2 more tasks to do that week.

I think this proves we don’t have enough developers. Colin’s been “preparing” for a project for a few weeks, Beavis has been making more excuses why he can’t work, Rob has been confused about his bug fix for weeks.

If we worked at pace (reference to a previous blog), then developers should be free to pick up this work.

Missing Update

I received a review from Colin which was modifying an internal tool. I have never made any changes for this tool, but I have had a nosey at other changes to have a vague idea of the approach to modifying this.

I point out that his SQL patch was missing an “update” statement. (Ideally, maybe someone should add a database trigger, because manually adding this extra “update” statement to every patch is a bit silly; but hey, that’s the current process).

He replies back asking why he needs the update. Before I had even responded, he had made the change anyway. So he has no idea, but is happy to do what I say! 

He also said “I have never made a change for this tool before”. I knew that was a lie. I could look at the ‘Completed Pull Requests’ and find 1 change that he did a few months back. I really haven’t made a change to this before, and I know that this statement needs adding in, it’s not really an excuse.

I explain that if you look at all the other patches, they all have it in. I also linked a recent review which was done by the “Module Expert” where he flagged this as a problem, and has provided a basic explanation of what it was for. This Module Expert was on holiday for the week, so we couldn’t ask him, or get him to review it; so Colin has to make do with me reviewing it.

3 hours later, I get another Pull Request from Colin. It’s a separate change, but it’s another patch for the same tool. It doesn’t have the “update” statement in.

Instead of shaming him on the Pull Request. I send him a message on Slack asking how he managed to forget. He then tells me it isn’t needed because “it is a different area”. 

No idea what he is on about. If he really thinks it isn’t needed, then why did he change it as requested in the last review? If I am wrong, then he shouldn’t have made the change, and he should have responded telling me that. Of course, if he sends a review without this expected change in, then I’m going to question it again, aren’t I?

I Want To Do Some Enhancements

Many years ago, the development team was set-up in a way that “proper” developers worked on enhancements, whereas Juniors worked on fixing bugs. To be honest, in many situations, fixing bugs is the hardest scenario.

I always like a variety, because doing a particular thing becomes tedious and isn’t giving you a good range of skills.

Julian started moaning about how he wanted to “do some enhancements”. He was itching to prove himself and craved variety.

The managers finally caved in and gave him a set of 4 enhancements to do as a mini-project over the course of a couple of months. When the deadline approached, he had left the company.

When we looked at his changes, only 2 looked even close to finished. They were riddled with bugs. One of them looked like an absolute botched implementation, and when I looked at the code and pasted it into Google, I found the Stack Overflow posts it was taken from.

Julian had just copied and pasted code, tried to change a few things here and there, but couldn’t improve it. I imagine he realised he has made a fool out of himself, so handed in his notice before we uncovered what a mess it was.

In the end, we just scrapped his code completely because it was a waste of time looking at it any further. 

Poor Communication

I love writing stories about poor communication. This isn’t the worst by any means; given that the conversation lasted 7 minutes which is quick compared to some other conversations I have been enraged about. The thing is, it is still too slow. I get the question 4 minutes after him initiating the conversation by saying “Hi”. It took a further 3 minutes to get the actual problem from him.

Why couldn’t he have just said:

Hey dude, when you checked in your code, did you encounter any test failures at all? I am getting these failures <insert link to failing build>.

Ideal Conversation

Then I could understand the problem and context, and viewed the build output to work out what was wrong.

Here is the actual conversation:

[Yesterday 5:38 PM] Andrew
Hi
​
[Yesterday 5:42 PM] Andrew
yesterday you checked in your code into the Main branch
​
[Yesterday 5:42 PM] Andrew
did you face any build errors
​
[Yesterday 5:43 PM] Me
No. It wouldn't have checked in otherwise. What's the problem you're having?
​
[Yesterday 5:43 PM] Andrew
okay..because we are getting some build errors
​
[Yesterday 5:44 PM] Andrew
just wanted to ask you
​
[Yesterday 5:44 PM] Me
What are the errors?

[Yesterday 5:45 PM] Andrew
unit cases are failing 

<Shows screenshot of a message that says “There might be failed tests”. Pretty much useless>
​
[Yesterday 5:45 PM] Andrew
in Configuration Manager

It’s frustrating constantly receiving small messages with barely enough information. I shouldn’t have to ask him twice what the errors are after being told there are errors.

Turns out he had missed the post by a developer stating that everyone must merge Main into their branch. His branch was a whopping 2 months out of date, which is pretty bad practice.

Report API fail

A tester says they are running a Test Case where they need to generate a report using our software. This report utilises a third-party service. The report fails to generate.

There should be errors displayed when you view the details, but it’s completely blank. So if they have correct data, then it’s a bug because it doesn’t work. If they don’t have correct data, then it’s a bug because it doesn’t tell the user what the problem is. It’s a bug either way.

Becky, a Senior tester:

  • Asks him if he has the correct data. Completely misses the point.
  • Then she posts a follow-up question asking him if he has gone directly to the third-party website to get the information. Completely misses the point.

No, Becky, the functionality is that our software contacts the third party API to retrieve this information. Manually going to the website and filling a form out by hand isn’t helpful, or acceptable. It isn’t going to make the test case pass is it?

“ok, we will pass the test because the website works. Nothing to worry about.”

Becky’s mind

Sometimes I think she tries to be dumb on purpose.

Slack Analytics #2: September 2020

In Slack Analytics, I stated:

I have sent 1,700 messages for the entire year.

I was interested to see my output this month. I have been sending a lot of messages to my Apprentice, I’ve been engaging in conversations with managers and testers to discuss many of my bug fixes. Some new Apprentices joined and I have also been helping them. Also, since I don’t get to physically talk much, my Slack usage has gone up.

1,803 messages over 26 days.

In Slack Analytics, I also mentioned the highest number of messages sent by someone was “3,500 across the 18 days she was in the office”.

Again, she still leads the monthly charts, but this time has 3,709 over 21 days, so it’s basically stayed the same.

Now this is interesting. How can you make everyone work at home, yet her message count hasn’t increased that much? I was expecting to see several people have counts that eclipse this figure. 

Slack only allows you to see Last 30 Days, or All Time, so I don’t think I can get access to see the change pre and post lockdown. You would imagine taking away face-to-face communication will increase everyone’s usage.

I guess there are two ways to actually slack-off work. Using Slack to send banter messages to your colleagues, or just not working. So message counts could go up because more people can get away with not actually working, or it can go down because they really are slacking-off work.

The new Apprentices have been pretty quiet so far, and I would have thought they would be constantly messaging people since they wouldn’t have a clue that was going on. There’s a developer that is really quiet when we were in the office and he never seems to publicly post anything on Slack. I tend to forget he exists.

DeveloperDays activeMessage Count
Apprentice A22 286
Apprentice B21 525
Quiet Developer23429
Slack Analytics

Biassed Twitter Picture Algorithm

People are accusing Twitter’s algorithm of being racist, which is sometimes a problem when it comes to algorithms based on Machine Learning.

If you don’t train the algorithm with a representative data set, then it can cause failures. e.g. if your data set only has pictures of males, then the algorithm will have problems when it is then tested out on pictures of females.

This sure is an interesting thread.

Originally, Dantley who works at Twitter, states the background is swaying the algorithm to choose the white guy. Then Graham has followed up with a set of tests.

In his reply, he has 4 pictures. If you click each one, you can see the original image. The image is very tall, which means Twitter has to crop it in order to display in the tweet. It seems that it crops the image around something interesting in the pictures. So it should detect 2 people and then needs to choose one to display in the preview. In each of the 4 examples, it is choosing the white man.

Dantley follows it up with an experiment of his own, by placing them in the same outfit, but he also removed their hands. This was probably just easier to edit, rather than swapping clothes and adding their hands back in. The black guy is then chosen.

I wonder what the outcome of this is going to be. Some people, including Dantley suggest it should crop the image, but I don’t want a massively tall image on my timeline. Maybe they have some other way of handling it that I’m not thinking of.