Why is my build failing?

A Senior Developer asked for help on Slack. His build failed, so he triggered it again and it failed with the same error. He had hoped it was just a random failure.

I pointed out the reason it had failed is because there was a failing unit test. His code change was a one line change. He had changed a boolean value from false to true, but hadn’t updated the unit test.

After I pointed it out, he then deleted his post for help. Destroyed.

The API

In our current project we need some user data to populate a screen with the user’s settings. I ask the team responsible for this data if they have an API we can call. It shouldn’t have been a surprise to them because we are replacing existing software and the requirements are the same – we just need to implement it using different technologies.

So I email them with the requirements. I want to pass them a UserId, and I want them to send me back the User’s Configuration in return. Easy.

A few days later they send me a meeting invite.

On the meeting they said they didn’t understand the requirement. I then show them the UI designs, and tell them that I need the data in order to populate a form for a particular user.

Two weeks later I get another invite. They still don’t understand what we want. So another team member explains to them.

The next day I get an email with their new design. I can call their API with an ID related to my application, then I have to do a separate API call but I can’t specify a user. The format it comes back in is JSON, but it’s not really conforming to the standard.

So you may expect some JSON about a film to look like this:

    {
        "title": "Gravity",
        "info": {
            "directors": ["Alfonso Cuaron"],
            "actors": [
                "Sandra Bullock",
                "George Clooney",
                "Ed Harris"
            ]
        }
    },

but instead they said they want to send this:

    {
        "title": "Gravity",
        "info": "{directors: [Alfonso Cuaron], actors:[Sandra Bullock, George Clooney, Ed Harris]}"
    },

So you can’t have nested objects which JSON is designed for. Instead, you just have 2 fields with everything smashed into one text string. Why?

Ignoring that though, after 3 months, they finally email to say it is ready. So I look at their Swagger documentation which shows all the API calls in a website, and even has a button that can trigger an API call so you can see an example response. “Brilliant” I thought, I can understand what they have done without writing any code or using a tool like Postman to send calls. It is simple as clicking a button. I click the button…but the message fails to send.

I then send an email, informing them that this doesn’t work. The developer says something along the lines of “we are in the process of changing the API”. What do you mean? I waited 3 months and you are saying it’s not ready – an hour after telling me it was ready!? He did get the Swagger documentation updated before the day was over, but I just don’t understand how it wasn’t updated as they were actually making the changes.

So after 3 months, we get an API that isn’t even suited to our needs. This is just madness because we will be the only consumer of at least one of the calls they implemented. Therefore, the API should be driven by our requirements. We shouldn’t be making two different calls, then doing some jiggery-pokery with the data, when it is easy for them just to send it back in the format we want.

Duplicate Builds

Recently, there’s been quite a few odd things our team has done and I haven’t put that much thought into them. So I have either:

  1. approved their Pull Request, putting my trust in the developer that it was the correct thing to do
  2. casually skimmed over something someone else has approved.

Then, when I spend more time in that area, I then think; “what the hell are we doing?”

I was making some tweaks to our build process and then thought “why do we have two builds for every check-in we do?”

So I look at the definitions and it’s basically this:

Build 1Build 2
Download and installDownload and install
Run linterRun linter
Run testsRun tests

Produce report
build steps

So the second build is just to produce a report, which it needs a valid build to produce. But it could just use Build 1. The sensible thing to do is just to delete Build 1, since Build 2 has the full steps we actually want.

So I ask the developer that set-up these builds and he said “it is quicker”.

So I’m like “what!? how?”

“it runs in parallel”

So? Both builds initially do the same thing, but one has an extra step so Build 2 takes longer. We have to wait until both builds complete before the code can be checked in. If Build 1 fails for some reason (e.g. Unit Test failure), then you are 99% guaranteed that Build 2 is gonna fail, therefore is a waste of time and money. The report can’t generate unless there’s a valid build, so the report may as well just use the existing build. Since they run in parallel, the total time is just the time of the longest running build e.g. Build 2.

He didn’t even seem convinced.

GitHub Actions

Probably a common thing I’ll discuss on this blog is people using technology just because it’s apparently cool. In a way, it is similar to when people write programs using languages they know, rather than using an appropriate language for the task. It’s like the mentality is “I’ve learned this thing, so I’m gonna use it!”

Teams have been using AWS Code Build to build their applications. Then one team decides to be different and try some new technology; GitHub actions https://github.com/features/actions. When their source code is on GitHub, it does make sense to use Build tools there, but at the same time, every other team is using AWS, so why not be consistent? To be fair, I guess a good reason to use it is that you get 2,000 free minutes a month, so I guess it is cheaper for one team to use it.

Now, there’s a couple of things I want to mention here.

A) This team that uses GitHub Actions has many people that I mentioned in the Anti-Microsoft blog (https://strangercodingtings.home.blog/2019/10/29/nerd-elitists-1-anti-microsoft/). Now, the funny thing is that Microsoft acquired GitHub back in June 2018 (https://techcrunch.com/2018/06/04/microsoft-has-acquired-github-for-7-5b-in-microsoft-stock/), and therefore, their servers are running on Microsoft’s Azure platform. I do wonder if those guys are aware of this.

“GitHub hosts Linux and Windows runners on Standard_DS2_v2 virtual machines in Microsoft Azure with the GitHub Actions runner application installed.”

https://help.github.com/en/actions/automating-your-workflow-with-github-actions/virtual-environments-for-github-hosted-runners#about-github-hosted-runners

B) In my team, we have 4 repositories, and 3 were set up to use AWS Code Build. I submitted a Pull Request to configure the fourth. I get a comment on my Pull Request saying “changes are OK, but I recommend using GitHub Actions instead”. No justification for it, he just thinks it’s cool, so wants me to undo the few hours work. I can understand if there was a strong advantage for using GitHub Actions over Code Build, and if there is, I would want all 4 repositories converting, not just 1. It just adds to the confusion, and adds to the things you need to learn and maintain.

William

I’m gonna create a new character called William because I have a good feeling he will be a good source of stories, so don’t want to group his stories with Colin.

There was a time I was looking through the recent changes to our software and I saw someone had added some additional error logging. When people have done this in the past, its because there is a bug they cannot recreate, so they need to add additional logging code to gather extra information in order to diagnose the issue.

Anyway, I look at the code and realise that what they were logging wouldn’t be helpful at all. I asked the developer what he hoped to achieve and he said “William told me to do it.” So I was like “If William asked you to jump off a cliff would you do it?“. He replied “But William is a Senior“. I was like “so what? you need to consider things for yourself“.

Recently, William has got involved in our project and has been trying to help decide how to approach our architecture.

We were discussing how users could be updated with real-time changes to data. William stated we couldn’t use SignalR. I asked why and he gave me a shocked look as if it was obvious. “It doesn’t work with web browsers” he said. Weird, I thought, because a partner product uses SignalR for their website.

SignalR supports “server push” functionality, in which server code can call out to client code in the browser using Remote Procedure Calls (RPC), rather than the request-response model common on the web today.

https://docs.microsoft.com/en-us/aspnet/signalr/overview/getting-started/introduction-to-signalr

I was looking into a solution involving Web Sockets and he came over and told me it has been ruled out because it is limited to 1 million connections. I was like “1 million connections? How many do we need?“. He was like “we need to think bigger than that, Websockets isn’t scalable“.

I asked around to get an idea of our current userbase. Turns out we have ~180,000 users, but around half will be active at any one time. So even if we did scale it up, 1 million is more than enough. I was speaking to one of the Architects and he reckoned that the 1 milllion limit can be surpassed with certain implementations.

Anyway, it got me thinking that maybe you shouldn’t always assume people know what they are talking about. William states things with such confidence that it sounds like he knows what he is doing, but really; it’s just the first idea that popped into his head.

“As a junior developer, hearing others say “I don’t know” is extremely reassuring and erases the common misconception that experienced developers know the answer to every question, and have a solution in the back of their mind for every type of problem. Admitting you don’t know something is nothing to be embarrassed of — instead, it’s a new learning opportunity for everyone involved! It may even lead to another occasion to pair programming or debug a problem together”

https://dev.to/httpspauline/5-ways-to-create-a-junior-developer-friendly-culture-3n4

So if William admitted that he needed to do some research, we could have investigated various messaging technologies.

Without me questioning his statements, we could have rejected two perfectly viable options.

Website Bundle Size

I was looking through our application and removing redundant code. I was checking the size of our application and wondered if I could reduce the size. A smaller download size will give the user a better user experience.

How long will you wait for a website to load before you give up and go somewhere else? Ten seconds? Twenty seconds? Apparently, nearly half of us won’t wait even three seconds.

https://www.bbc.co.uk/news/business-37100091

In my research, I found this blog https://www.freecodecamp.org/news/3-easy-ways-to-boost-your-web-application-performance/ which mentions a few areas to look at; reducing image sizes, code-splitting/lazy loading, and reducing the bundle size.

For the third point, it gives an example of a Javascript date library called “moment” which is something my team was using. When I looked at our bundle stats, Moment was contributing to 30% of the size of our app! We only used it as part of a Date picker control which we use once. The blog recommended a website called Bundlephobia e.g. https://bundlephobia.com/result?p=moment@2.24.0 which shows you the size, and recommends some alternative libraries. Moment is 231.7kB and 65.9kb (compressed) whereas “daysjs” is only 6.5kB and 2.76 compressed. If you look at their time stats, the site reckons it will take around 1 second longer to use Moment rather than DayJS.

According to research from digital performance measurement firm Dynatrace, just a half second difference in page load times can make a 10% difference in sales for an online retailer.

https://www.bbc.co.uk/news/business-37100091

Git Hub user martinheidegger logged this very issue back in Aug 2016 https://github.com/moment/moment/issues/3376

It looks like development/maintenance has slowed down this year, but I’m surprised that the Moment library hasn’t been abandoned completely. Loads of people have highlighted the size is a problem, yet no one has actually sorted the issue out after all these years.

When the research highlights how important fast speeds are to users, then you would think developers would put more important on performance.

Full Regression

Several months ago, a project was completed (let’s call it Project X) that must have contained the biggest impact to our software. What I mean is that the scope impacted a lot of features, and therefore to sign it off, the testers ran an insane amount of regression tests. The Project X team ended up running these tests with the help of other teams, and many hours of overtime was needed.

A release was being planned that involved more bug fixes than usual. I think one of the Test Managers had suggested that we run a full regression test to ensure the changes made by Project X hadn’t been effected. Marcus, a tester involved in Project X stated that this was an “unreasonable request, and we need to do a more focussed, targeted test”.

The next day, a Test Manager came up to Marcus and said she had asked someone to give estimates of a full regression and she had been quoted 6 weeks. She said it was absurd and wanted Marcus to confirm it. Marcus stated once more that he had already highlighted it was an unreasonable ask. He explained that she is requesting that 4 people run the tests over a two week period, when the original team had three weeks and around 20 people were involved. This is why Marcus rejected the proposal in the first place.

Another day passes, and another Test Manager announces that, after much debate – running a full regression isn’t feasible, so they will do a targeted regression.

No doubt there were one or more meetings to discuss this, when Marcus had already told them how unfeasible it was a couple of days earlier. It’s just that people with the authority to make the decisions, and the people that actually have the knowledge of how things work – are completely different people.

Gun ’em down

A tester had created loads of alerts and hadn’t disabled them. Jack sent a mass mail to rant about it, stating that any tests that can annoy others should always be cleaned up. He was doing some important testing and having alerts spam his screen was slowing him down and stressing him out.

Later, I sat next to Jack in a meeting and decided to wind him up about it.

timeinints: “Hey Jack, what are you gonna do about this testing situation? This happens a lot and we need to stop it.”

Jack: “What I’m gonna do, is march down to their office and gun them all down.”

The meeting host then points out that members the other office were on the conference call.

Well, that’s awkward. 😀

Jack

Some teams make a presentation to show what they have done over the previous Sprint (usually 2 week period). One team hadn’t achieved what they had planned, so included a table to illustrate that they were low on staff, mainly due to annual leave. So it looked something like this:

25th NovemberJack off
26th NovemberBen and Jack off
27th NovemberSally off
28th NovemberBen at a conference

Jack off!

The team was jacking off for a few days so they weren’t very productive.