More Colin

In the early days, I wrote many blogs about Colin, a Senior Developer who was constantly writing bad code, cutting corners, and saying dumb stuff. After going through some old chat logs, I found a few stories that I don’t think I covered in the blog before:

Two Days For Nothing

Me 13:06:
Colin asked me for help. He says he has spent 2 days on his Work Item
No changes were needed; next!

Dean 13:06:
haha what's the Work Item about?

Me 13:07:
he was baffled by everything. He was creating a template with “Last Entry” field and he was like "why isn't it showing anything"
I said "your selected record is empty"

Colin just accidentally checked in a changeset – not just a file, but everything

Me, in dismay

Misleading Braces

Usually you use braces to group together blocks of code like for an If Statement, or Methods. Colin put his logic on the same line as the “if”, but then used a brace under it. So it looked like the brace belonged to the “If Statement” but it actually did not. We weren’t even aware you could just randomly put braces around code like that.

if ((!criterianValueSets.Any() || hasConcepts)) return hasConcepts;
{
//other code here
Me 10:12:
what a sneaky return. I'm surprised you can have a { after a return like that

Dean 10:12:
so what does the { even mean here?
like why is that code block in braces if it's not related to the if statement?

Me 10:12:
just groups together a random bit of code
I guess technically it is the else

Dean 10:12:
so can you do that wherever you want?

Me 10:12:
I was wondering that
you would have thought code analysis would moan even if it is valid

Dean 10:14:
you can, weird, and it affects the scope too...
is that legacy code where you found it?

Me 10:15:
nope. Colin. always Colin

Node Resolvers

Colin didn’t know about the Accept and Cancel properties on a windows form. His mind was blown

Colin said there were multiple Node Resolvers and he was stripping out Nodes from one of them…then a minute later he says there are only 1 Node resolver and he wasn’t stripping out Nodes. Now Steve is confused because he was calling Colin’s code and was expecting it to strip them out

Open closed principle

Me 15:12:
Colin is making a bit of a fool out of himself, showing a lack of knowledge of programming concepts
I said to him
“do we really need 3 separate methods for this? what about 1 method with a switch?”
So he simply replied “Open closed principle”
So I explained he was actually violating that principle by writing it the way he did.
“If there was a new Node system, you would have to change this dialog. It doesn't matter if you do my suggestion or leave it as it is. To conform to the open closed principle, surely you would need to pass in an object which can return the string. That way if a new Node system is added, you would add a new Node system object, and this class wouldn't need to be touched.
Anyway, merging the 3 methods would be neater”

Dean 15:13:
Urrrgh

Me 15:13:
I also slagged off that class he wrote before
“I reckon that the guy who came up with Polymorphism would be in tears if he saw this class.”
Colin had replied “Is that a complement ? I do not see any problem with it .”

I then emailed him with how to write it, and he now realises I am right

Dean 15:14:
That's good

Bugged Mindset

Colin always gets annoyed when I find a bug in his code.

But when testing miss something he loves it, even though it’s out in live and therefore it looks bad for all of us.

He’s not happy I have logged two bugs for his provenance story

NodeSystemCompatibility

Colin wrote a few similarly named methods but they did slightly different things but it wasn’t clear when to call them, and some returned a different value than what you would expect.

Me 12:45:
CheckNodeSystemCompatibility , IsNodeSystemCompatible
what is the difference?

Dean 12:45:
Lol

Me 12:45:
but don't call IsNodeSystemCompatible for documents, you need CheckDocumentForCompatibility
and CheckNodeSystemCompatibility has a switch that calls different methods and then negates them
case FolderItemTypes.DocumentTemplate:
return
!IsNodeSystemCompatibileForDocumentTemplate(selectedItem);
so if it is compatible, CheckNodeSystemCompatibility returns false
I think we should just delete the branch and pretend it didn't happen

Dean 12:50:
Hahahahaha
Why are they overcomplicating things??

Me 12:53:
they want bugs. it's not even unit tested

Dean 12:57:
What?!

Health and Safety

Me 14:08:
Colin has a wrist rest on the windowledge and it has melted

Dean 14:08:
ha

Me 14:08:
the gooey stuff has dripped down towards the plug sockets
bit of a health and safety hazard
meanwhile, he also had a hot chocolate hidden behind his monitors that stunk of sick

Dean 14:10:
nice

Me 14:13:
Colin claims he got that drink this morning
a bit worrying if the machine is dishing out sick

Me 15:22:
Mel reported Colin's wrist rest incident
the handyman dude is here to save the day

Dean 15:23:
thank god

Multiple Attempts

Me
is this correct? ...we can copy folders if there is one item we can copy, regardless if there are loads which it can't copy?

Colin
my mistake. should be the otherway round. Will change this to "!documentItems.Any(di => di.CanCopy)"

Colin
correction. documentItems.Any(di => di.CanCopy.Equals(false)); lol

Colin
documentItems.All(di => di.CanCopy). sorry my brain isn't working.

Call It Twice

Me 16:15:
bool canCopy = NodeSystemHelper.GetActionsWithProvenance(selectedItem: SelectedItem) != null && NodeSystemHelper.GetActionsWithProvenance(selectedItem: SelectedItem).CanCopy;

if the returned object isn't null, create it again and check a property
classic Colin

Dean 16:18:
Wtf; that's melting my head
i don't see how you can take any enjoyment out of development writing code like that

Christmas has come early

Me 16:18:
seems Colin has left us with 36 failing unit tests
Christmas has come early

Manager 16:19:
want me to get him to sort them out?....

Me 16:19:
are you going down there and punch him in the stomach
Manager 16:20:
gonna stove his head in!!....

No Unit Tests

Me 13:22:
Colin fixed a bug
“Fix bug where protocols created pre-version 1.5 shows N/A rather than the correct value”
On the code review:
“Me
unit tests?
Colin
Not failing”


oh that's ok then. Just fix a bug and don't cover the missing scenario
WHY DO I WORK HERE?
Dean 13:22:
Why would you not write one??
Me 13:22:
it may fail if you write one 😀

No Root Cause

In one part of our application, we always have problems where you select some data in a grid, then the menu bar refreshes multiple times. 

This time, there was a bug involving a menu option not becoming enabled until you hover the mouse over it – which was very strange.

Colin then decides to fix this new issue by adding another call to refresh the menu. Brilliant. It already flickered many times – let’s add another flicker!

The lead developer questions this change, and asks him what the root cause was. “This code is complicated, so I didn’t investigate”. Brilliant, totally lazy.

Luckily another developer stepped in and provided the proper fix.

If you don’t understand how the problem came about, then you could end up adding “hacky” code to make it work. But this just pollutes the codebase with more bad code and can cause more confusion and make the codebase harder to diagnose future issues. Good developers don’t cut corners like that.

Project Aurora & The Strangler pattern

Recently we have had another tech guy join the company who is reporting to the CTO. I find that people in these kind of roles want to put their stamp on things by coming up with a new idea.

He presented his idea in our monthly Tech Meeting. He wants to attempt to address our performance problems by taking traffic away from our main on-premise databases. There’s been some similar ideas recently, and although I’m not great when it comes to hardware, networks and general software/hardware architecture; I am sceptical that these ideas can work.

His idea is that we can replicate the database in the cloud (“the cloud” solves all problems you see), and then the database in the cloud can be used for Read access, whereas Write would still go to the main on-premise databases (then synced up to the cloud).

The Announcement

This programme of work is to move workload away from our primary systems to enable these systems to withstand expected load factors from upcoming initiatives as well as expected growth in usage on our APIs during Winter 2023.

The intent is to run focused cross functional teams in work-streams across the group to deliver this initiative. The approach taken here is to place multiple bets, across multiple teams. The expectation is that not all teams will deliver by September, but enough to bring in the headroom we need.

The programme is intending to free up at least 20% load across our core databases.

Upcoming aims:
• Strategic, move read-only workloads to Aurora.
• Redeploy APIs to AWS, Move to cloud technology, Containerise and Optimise Service
• Enable use of replica data when ready.
• Move Appointment Workload
• Mitigate 8am peak load.
• Use caching engine on AWS (Elasticache/Redis), mitigate 8.2% of PC DB Load 
• Reduce load on the DB during day time.
• Reduce Datacentre and DB load and improve performance
• Mitigate 6.2% of DB load by optimising how we summarise task counts
• Proof of concept is Complete, expected to cost £2m a year.

My Conversation With Architect Mark

I think the reason for the replication (as opposed to just moving it all to the Cloud) is that you can’t fully commit to ideas like this. You have to have a rollback plan. So if we find it doesn’t work, is too expensive etc., we can just return to the old way without much inconvenience. I asked one of our Software Architects what he thought of the plan because it doesn’t sound right to me:

Me
doesn't sending data out to another database just increase traffic, and they wanted to reduce it?
Mark
Yes, it will also be delayed, and often broken
Me
no pain, no gain
Mark
they're replicating data, and it's unlikely it'll be used
Me
I don't see how you migrate things. You have to keep them both running until you are confident it works, then bin off the old database. But then in reality you just end up keeping them both for longer than expected
Mark
you then also need cross-database transactions or to be very careful with queries
yeah, that's basically it. Have the same API at both ends, some sort of replicate and transform on the data to ensure it's in both. Persist to both simultaneously, then when all works, turn off the old
Me
The CTO said that “some people say there is a delay, but it is only 5 minutes”. Does that address any of your concerns at all?
Mark
no, this is only the second time I've heard about this, and the first I laughed
I agree with the principle of strangler pattern for migrating, but this isn't migrating
it's keeping multiple DBs 'in-sync'
Me
does that mean you can view an appointment book which is 5 mins out of date, and you try book an appointment, then it checks the real database and is like "no mate you cannot do that"

The conversation between architects

Mark then sent me a conversation he had with two other architects, Andrew and Jon. Mark already had concerns with the “appointment book” example.

Mark
so when this replication system goes down for a few hours, what happens then? I guess the system tries to book appointments for slots already booked, put in requests for items already issued etc.?
seems our business layer needs to be aware of how outdated the original information was, so it can compare something like a changelog number. Sounds like a big challenge to implement correctly

Andrew 11:10
Yes, any write operations will need logic to ensure that cannot happen Mark.
John and I have already called out that Appointments and Orders will have significant challenges with this replication model and have suggested that the initial focus should be on User Profiles, and any historic data, etc.

Mark 11:13
User Profiles and historic data seem just as dangerous to be honest.

Jon 11:15
The idea I suggested these is that you would check the change log on the primary system before even considering going to the replica. If the User had had a recent change (what counts as "recent" is TBC, I suggested 30 minutes) you wouldn't even consider going to the replica.

Mark 11:15
can we implement the strangler pattern properly? set up proper Appointments APIs to use in our datacentre, and AWS.
duplicate the data.
then dual file everything against the APIs? if one fails to file, the other gets rolled back.
we ensure consistency, we can transform the data, and we're using the pattern as intended
Jon, I agree your idea is the right way to do this sort of thing, but it will be adding logic and latency in a lot of places (as well as augmenting every one of our products to be aware of this), and not bringing us forward, but continuing to keep us in the primary data-store model

Jon 11:18
Honestly if the use case for customers looking at their data, then having it a touch out-of-date information isn't as critical as if our actual users sees an out of date view. As a hypothetical Customer who knows nothing about IT, if I viewed my record straight after a consultation
and it wasn't there I would just assume that there was a delay and it would appear later.
When it comes to actual Users viewing the record, it's absolutely critical that they see the up to date view. And when it comes to appointments that's also critical because appointment booking is fast moving, it'd be an awful experience for a User if every "free" slot they booked turned out to be booked minutes earlier.

Mark 11:19
depends, if you've just requested a particular item and the page doesn't update to indicate that, can you continue requesting it?

Jon 11:20
Many of our users (mine included) turned off online appointment booking entirely at the beginning of the pandemic and use a triage system now.
You wouldn’t be able to successfully request duplicate items, because the write would take place conditionally, so if it had been requested already then it'd say no (if designed even
vaguely competently).

Mark 11:22
the write wouldn't come through, but it'd be confusing for the User seeing the prescription still requestable, unless the application has its own datastore of state

Jon 11:22
Yes it would be far from ideal. But the CTO has some ideas about that (having a "recent changes" dataset in a cache that is updated live, and merged with the replica's data.
feels like there's loads of little bits of logic that need 'tacking on' to resolve potentially quite serious incidents. When the correct use of the strangler pattern gets us away from on-premise as primary DB, and moving in the direction we want to go
Yeah, this isn't easy and requires careful consideration.

Andrew 11:30
You are absolutely right Mark - there are a heck of a lot of potential gotchas and ultimately the plan has to be to use the strangler pattern, but at the moment we are looking at a rescue plan to put out some existing fires in the data centre and to handle predicted significant increase in load that will hit us in the Autumn. Everything that you have flagged is being considered.
The only fall-back plan that we currently have is to spend nearly £4m / year on additional SQL Server readable secondaries (on top of having to pay an additional 12% on our existing SQL Server licences thanks to MS hiking their prices) and nobody has the appetite for that.

Closing Thoughts

I don’t know what the Strangler Pattern is, so I’ll add that to my reading lists. However, it seems that even with my limited knowledge of architecture, our Software Architects have similar concerns as I do. There’s been plenty of ideas that the CTO (or similar level managers) have quickly backtracked on due to not consulting people who have knowledge on whether their idea is actually logically sound. I’ll keep my eye on this idea to see how it develops.

Development Environments

This blog is basically stealing “Healthy Software Developers” explanation of “Development Environments – Isolating Customers From Your Changes

Introduction

“Development environments” allow software engineers to view a particular version of the software and control who has access to it. 

If you go back about 20 years ago, there was much less maturity with how changes were rolled out to customers, and every company had a different process. In some companies, developers would just immediately roll them out, straight into production. This might work for the most part if you have a very small and trusted team without the process or contractual requirements. These days, the standard is that most companies need a minimum of three separate environments when developing software.

Even where I work, people used to tell me it used to be much less professional, and without as much legal scrutiny – allowing them to drop DLLs onto a server to quickly fix a bug for a single customer. Now there’s all kinds of people who need to sign it off, and 99% of changes are released as an official version to multiple customers.

Development Environment

The main development environment is a place where an individual engineer can actually work on the software in isolation from any other stakeholder such as a Tester or Customer. Instead of making changes directly to a live website/application, the changes can be run on a local copy on the developer’s machine. The developer can begin looking at the impact of the changes he’s making, and will use test data. Any problem introduced only affects the single developer.

It could be as simple as a copy of a database and a version of the website. A more sophisticated program may have even more dependencies, so need multiple websites or services configured – so you could leverage modern technology like “cloud services” or “containers” to easily deploy this setup. The decision of how this looks depends on the complexity of the software architecture, team size and process.

Staging Environment

The “User acceptance” test environment, sometimes known as a “staging environment”, is another copy of the application, but developers aren’t making any changes to it, and it is also using test data. The purpose of this environment is typically to run exhaustive testing to find any issues before deploying the changes to the end users. 

When companies rely on manual testing, and don’t have modern “continuous delivery” processes, an unreleased version may sit in the staging environment for an extended period of time. In a more automated environment, automated tests can be run, and the version easily deployed to customers if passed. This could take as little as fifteen minutes to an hour depending on the tests, software complexity, and process involved.

Production environment 

The Production environment, or Live environment, is where your customers use your product; so the actual website or application. 

Demo environment

You can have other environments like a Demo environment. This is another copy, but not used by Developers or Testers. The version can be “work in progress”, or a version that is deemed to be ready; but the purpose is to show customers upcoming features and to gather feedback.

Capacity testing

You could create a special kind of test environment for capacity/load testing. This is meant to  stress test the software and see how it’s performing under heavy load. Systems can have increased traffic at specific times of the day, e.g. users all logging in at 9am to start the day, increase in website hits during lunch breaks, more shopping traffic during Christmas period etc. If users are paying to use your service and it goes down; it’s a huge problem. If you sell a product/service on your website and it goes down; then it can cause lost sales. So performing this testing can be essential.

Security audit/Penetration Test

Last year, my employer paid for a “security audit” for our software, and any “issue” on their report was then a high priority to fix.

I think the majority of the issues they stated were incredibly unlikely to happen, or there would be an easier means of acquiring such information. One of them was that UDP messages, which are only sent on the user’s local network – had an unencrypted username. But if the attacker was already inside the user’s building, it would probably be much easier just to look at the user’s screen rather than plugging into their network and running a packet-sniffer.

Problem 1:

Shortly after we released some “security fixes”, we had a few bugs reported, one of which was:

“Users unable to reset their own passwords using the self-service password reset process”

So the security improvement created a Security major incident. Brilliant.

Problem 2:

A colleague was explaining another problem which I only had a vague understanding of. But it was to do with the version of a protocol being sent in the header of a message. It is classed as a security flaw because an attacker can look up known security flaws for that version and try to exploit the system that way. I suppose they can still guess what the version is, and try different “attack vectors”; but I suppose the less information you give them, then the more hassle it is. As my colleague explained it, a change had been made to remove the version, and tested on a specific version of SQL server, but we have multiple versions on our live infrastructure. When it came to demoing it with a client, they discovered a particular feature was broken on a different SQL Server version. A little awkward, but at least it didn’t go live.

Potential Problem 3:

When it comes to authentication failures, if you tell the user that the login attempt has failed due to the username being wrong, or the password being wrong – you are making it easier for malicious users to attempt to gain access to someone’s account. So if an attacker is guessing usernames and normally sees “Invalid username”, and eventually gets an “invalid password” message, then the attacker knows the account exists and now just needs to get the password.

There was one API call that returned an error code along with the message “Invalid username”. So as advised by the security audit, the developer changed the message to “Invalid username or password”.

On the code review, I pointed out that the developer also needs to change the Error Code too, and they replied saying the security audit stated that the message was the problem. They definitely hadn’t thought about the actual problem. If Error Code 40 is “Invalid username” and Error Code 41 is “Invalid password”, and you change both texts to say “Invalid username or password”, then it’s not really any more secure is it? since we are still returning 40 when the username is wrong, and 41 when the password is wrong. What we need to do is make 40 and 41 obsolete, and make a new Error Code for “Invalid username or password”. However, can we actually do that? When you have third parties using your API, if they have written code which relies on certain return values – then you break their software when you change yours. So we would need to inform everyone that the change is coming, then release this change.

UDP and ChatGPT

A senior developer questioned why we were encrypting elements of the UDP message. The developer struggled to explain and then incorrectly used ChatGPT. It was obvious by the increase in technicality and vocabulary in his response.

George
Out of interest, why are you encrypting this?

Karthick
According to the system audit query, MAC address and Sending user display name and the location name should be encrypted.

George
But Mac Address is sent in plaintext as part of UDP - I can see it in your screenshot above

Karthick
It is encrypted at Controller before sending.

George
Yeah I know you encrypt it in the payload, that was my question, why are you encrypting something that is in plaintext elsewhere on the network traffic?

Karthick
In most cases, the visibility of the sender's IP address is not a concern. IP addresses are public and essential for routing data across networks. While it's possible to use techniques like VPNs or proxies to obscure your real IP address from the recipient of the data, these techniques are not directly related to the encryption of the UDP packet payload.

George
Did you ask ChatGPT for that?
Also, I'm talking about MAC Addresses, not IP addresses.
I don't really care if you encrypt it or not, I was curious what decision led to you encrypting a MAC address.

Karthick
ChatGPT gave a good answer 😃
I cannot see MAC address of Wireshark UDP packet trace done in my laptop.

Govind
From the Pen test, it is reported that the
"Disclosing MAC Addresses and display names of users provides a wealth of information to a potential attacker"
via panic alerts.
That's the reason we are encrypting the Mac address.
And I believe the Mac address showing on the Wireshark, is the system Mac address(the interceptor machine) on which the Wireshark tool is running.

George
All you had to say was "Pen testers" and I would have been happy 😄

How To Confuse Women

How to confuse women

  • Step 1 centre the text in a cell
  • Step 2 watch the confusion

Alright, it’s a somewhat clickbaity title, but I did cause a lot of confusion to one person with one simple feature, and she still didn’t understand when I felt I had perfectly explained it.

I had made a grid which shows a person’s name, alongside a list of items they have ordered, in alphabetical order; as per the requirements.

Later on, we would add alternating row colours to make different people’s orders more distinctive. In the example I came up with, there were only 2 orders on a page, and I had left the row selection highlight on, so it actually looked like we did have alternating colours.

The examples from the UX team only had a few items per person, so it wasn’t clear how they had aligned the text. I left it centred which looked a bit weird for large orders, which is the example I came up with.

PersonItems
Lisaphone
 CD
 hat
 lamp
Jamesphone
 shorts
 t-shirt
 towel
I don’t see how I can accurately illustrate this with WordPress. Imagine Lisa’s row is fully highlighted

 Just to show my progress on this new project, I posted a screenshot to generate a bit of hype 😉

Olivia was confused about why it wasn’t in alphabetical order – but it is in alphabetical order!

Olivia: why is “phone” at the top?
 
Me: Ulrika's designs or my screenshot? Mine is alphabetical. Phone is for a different Person. We still need to add the alternating colours, and maybe don't centre the name to make it clearer
 
Olivia: this one <sends screenshot of top row>
 
Me: Different Person
  
Olivia: I'm confused. Lisa's name is the same row as Phone
 
Me: There's 2 People there. Just that the second person has loads of items, and because we have centred the Person name, it's halfway down the row. We can use alternate row colours as Ulrika’s design, and we can stop centering the name to make it clear.

I message Tim:

Me: I can't believe the confusion that has arisen even though I have explained what it is.
 
Tim: I got your back

<Back in the main chat>

Tim: Phone is for Lisa there
 
Tim: Everything else is for James
 
Tim: Which is when the alphabetical order kicks in.
 
Olivia: Thank you Tim, That was seriously messing with my mind. The name needs to be at the start of the list of items.

<Back to private message with Tim>

Me: I cannot understand why my explanation wasn't good enough. Is it because I am a developer nerd and cannot communicate with people?
 
Tim <jestingly>: the trick is to keep your answers on one line. Colours and shapes help too.
 

I’m so baffled. Why couldn’t they understand what I wrote? I wrote a perfect explanation then Tim just put his explanation in short messages, and he was thanked for his explanation. It’s like I don’t exist!

16 weeks work

Some teams post fortnightly updates on what they have done on our internal social media platform Microsoft’s Yammer (Now named Viva Engage). No matter what they post, managers seem to respond with positive comments.

So here is an example. On team stated they had completed:

3 user stories. 1 dev investigation. Improved code coverage.

Then attached was there code analysis report which stated:

  • 0 Bugs
  • 0 Code smells
  • 100% coverage on 1 new lines to cover
  • 0% duplications on 8 new lines

I never know how to make sense of these stats. The duplications mention 8 new lines, but yet the code coverage mentions 1 new line. It can’t mean it is still to cover when it states they have 100% coverage.

Their video demo stated it was “Sprint 8” which means it’s after 16 weeks of work. They logged into their application and there are 2 buttons (not functional).

My Work

I’ve been working on a game in Unity and have also reached the 16 week mark. I have only been working on it sporadically. Early on, I might have worked a few hours and many days per week, sometimes working several hours at a weekend. Other weeks, I have just worked on it for 3 hours or so in one afternoon only.

I have written loads of code for the AI state machine. Your units can automatically move in a formation, break away in certain conditions. Then the defending group of units have their own behaviours, and work together in their own way. Some behaviour is configurable so I have the logic and UI for the player to change it. There’s controls for the cameras, game speed, pausing, save and reloading. A few options for volume and a few graphical effects. There’s some miscellaneous aesthetics such as animations, particle effects.

I am a single developer working fewer hours, and I have to play-test it myself. Compared to a team with a few developers, a few testers and different managers assigned – all working a 9-5 day, 5 days a week; and all they have is a menu and a title banner.

Hackathon Progress

Another comparison is to compare the progress of developers during a “hackathon”, or “game jam”.  Developers can often put together a prototype, or a fairly fun game within a few days…

Developers during hackathon: We built an entire application in just 3 days. Developers after hackathon: Adding that icon is going to take 3 weeks.

Mark Dalgleish

Conclusion

Recently, I’ve been thinking about this a lot. I think if you start with no code, then any code you write has a much bigger impact. You also don’t have to work out how it works alongside other code.

So the more code there is already, then the longer it takes to write. So if the Hackathons are productive, then you can write something fast. Same situation with my Unity game, and since I am working on it myself, then I have no one slowing me down.

The other reason why working on my own is great is that I can set the process. Due to the 90/10 rule, I think it is best just to get it 90% done, then move on. If you change your mind of how something works, it isn’t as big of a time waste. If it turns out to be good, then you improve it later, alongside other features when you are changing code in that area.

So going back to this original team who did 16 weeks work and got nowhere – even though it’s a new software product, and should be able to code fast – I think they are slowed down by their insistence that everything has to be perfect and has all the extras like Unit Tests. So they could get a feature 90% complete quickly, but then they are told to spend several days perfecting it for that last 10% of quality. Then they are messing around with non-functional stuff like the Deployment Pipelines and miscellaneous processes. When you are months away from releasing, I don’t see the point in dedicating so much time to how you are going to deploy it, so I have been 100% focussed on features and gameplay.

I think this could illustrate how process can severely slow you down, and striving for perfection can also lead to inefficiencies. I think my idea of “90% quality is good enough” could be a winning approach, then you can always perfect the features if you are ahead of schedule. If I keep working on my game, I will worry about how to release it when it is close to a releasable state (ie how to get your game listed on Steam).

Mini Musing #9: Char

I was watching a programming tutorial recently and I heard the teacher pronounce the datatype “char” like the word “car” and it made me think.

( ͡ಠ ʖ̯ ͡ಠ) 🤔

It is a shortened form of the word “character” – as in “a single letter” – which you do pronounce “car-acter” – so it probably should be pronounced “car”.

Yet, I have been programming for years and I have never heard someone pronounce it like that.

Angry Unicorn

I was getting a 503 error when trying to use Git. After searching for internet for solutions, I came across this “unicorns have taken over Error – 503“.  I wondered what that was all about? Was it one of those stupid error pages some companies do as a joke?

After a bit more searching, I came across another one, Title: “Intermittent angry unicorn”, and it’s clearer what they are on about 

“I’ve been subject to the angry unicorn for over a week now.”

That angry unicorn sure does look fierce.

Commit Messages

As a developer, you need to “commit” your work into source control. When you do this, you need to provide a message that describes the change. A bug fix or enhancement can consist of a few commits. When you use a source control like Git, you can even “rewrite” the “history” by using squash/rebase commands.

When I am writing code for my job, I often put a bit of thought into the wording to make it clear and professional. If it is for a personal side-project, I sometimes write meaningful messages, but sometimes I may group together multiple fixes, then come up with something generic like “Bug fixes”, or might commit more experimental work-in-progress under a name like “attempt” or “x feature part 1”.

It’s quite frustrating though to see colleagues write generic messages like “bug fix” which doesn’t describe what it is fixing, or how it is fixing it. Seeing messages littered with spelling mistakes is also annoying and unprofessional.

Examples include:

  • “EventNotificationBugFix After Resolving James’ Comment”
  • “bug resolved”
  • “Dev test changes from tester” (literally what does that mean?)
  • Updated the findCorrectCodeSp to findCurrectCode.
  • Taken off completly the fix Amanda done for the fix for 25477 and fixed. Included the fix for 8017 as well
  • Fix for SQLCrop issues (should be SQL Cop, our Code Analysis)
  • Fioxed further typos (ironically creating more typos when fixing typos)
  • fixed the failed unit testes (testes rather than tests. Brilliant)
  • “Post Christ’ comments re coding standards” (it’s the second coming of Christ to teach us coding standards! They meant to write Chris.)

There was a guy who worked in our short-lived Scotland office who sounded like an absolute nutcase and I have never seen someone not take their job seriously like this:

  • instructions unclear, got dick stuck in tfs 
  • what a waste of eyes
  • but fuck wasps
  • be nice to bees
  • what if there were landmines in dinosaur times

A colleague recently showed me this website https://whatthecommit.com/. I don’t know if they are based on real messages, but it shows you a new message every time you refresh. Most are pretty basic along the lines of “does this work”, but there’s some more outlandish ones if you persevere and refresh many times.

One of my team members recently submitted a change that was labelled “Crash when cancelling out of the dialog”. That describes the bug that he fixed, rather than what he fixed. Another team member provided the following good advice:

The change looks good, but your commit message isn’t describing the change. Could you just reword it so that it completes the sentence “When applied, this commit will…” please. Another way of looking at it is that every commit should read like an instruction in a to-do list. I’d use something like “Return empty collection if user cancels out of the dialog”.

Senior Developer

A stricter idea of commit messages is described here: https://www.conventionalcommits.org/en/v1.0.0/

One of our Principal Developers loves linking people to this guide https://cbea.ms/git-commit/. One interesting idea is to “Use the imperative mood in the subject line”, whereas most people would write in the past tense.

Having a clear history of file changes helps you when it comes to finding which change caused an issue, and also gives you a better understanding why it changed.

Errors logging errors

Today, I saw some code that looked like the classic “I can’t recreate this bug, so I’ll write some code to log extra information, and I’ll hope this somehow helps me diagnose the issue”.

They had a list of data, and the ID in this list should be unique, but it was in there twice due to a bug. So it would look like something like this (ID =1 is in there twice):

private List<Code> _codes = new List<Code>()
			{
				new Code
				{
					IsSelectable = true,
					ID = 1
				},
				new Code
				{
					IsSelectable = true,
					ID = 1
				},
				new Code
				{
					IsSelectable = false,
					ID = 2
				},
				new Code
				{
					IsSelectable = true,
					ID = 3
				}
			}; 

Then their method contained this code to validate the data:

		private void EnsureValid(int idToFind)
		{
			try
			{
				_codes.Single(s => s.ID == idToFind);
			}
			catch (InvalidOperationException)
			{
				Dictionary<string, object> customData = new Dictionary<string, object>
					{
						{ "ID", _codes.ToDictionary(x=>x.ID) }
					};

                LogToErrorDB(customData);
			}
		}

The Single will throw an exception due to there being more than one matching element (Single will succeed when there is a Single match). In the catch block, they then convert the code list to a dictionary and will log this in the error database. However, a dictionary requires the keys in the dictionary to be unique. Since there’s duplicate IDs, this will throw an exception again, this time with:

“System.ArgumentException: ‘An item with the same key has already been added.'”

Error thrown in the catch block

So you get an error whilst trying to log your error. The extra information they intended to log is never logged.

I’ve seen this mistake happen a few times before, and usually you can test it by artificially creating the data condition. Then they would realise their error logging doesn’t work.