Software Naming

There was an internal meeting where a new product called “Recruit” was announced. The first question was that “it sounds like it could be confused with a method of recruiting staff to work for us, so was that discussed?” 

The manager said “to be honest, I never considered that“.

He then added there were around 20 people who were in the meetings, and no one had questioned it, or raised any objections.

A few months prior, there was an announcement about a new team that was handling branding in Marketing. We were told we couldn’t create any names without going via them. The last product names they came up with were ASSistant, and ANALytics.

I thought that if the software isn’t well received, it could easily gain a negative nickname, and people could make statements like “the software is ass”.

A Product Owner recently stated that the Assistant branding will soon be phased out, and it will just be merged into our main software’s branding. The decision came about when another Product Owner was doing a demo and had created a test user with the name “ass”. A manager flagged it as unprofessional and was concerned that we could easily demo something like that to external clients.

“you probably want to change those Ass users”

Manager

So far, the marketing naming team hasn’t got a good track record.

Twitter -> X

Twitter was a really strong brand, so Elon’s instance to change it to X seemed like a baffling choice, and it seems one based on his fascination with the letter X rather than any business reason.

As a…
– user
I want…
– a strategically-timed, comprehensive rebrand
so that…
– negative discourse regarding company practices can be briefly veiled with commentary regarding new name, logo, and brand assets, alongside excessively-shared parodies of the brand in meme format
here’s why everyone is WRONG about twitter changing its name to x being “a baffling choice” and “not a smart thing to do”: a thread

1/ x is an awesome letter, one of the coolest letters in fact! I just turned 8 years old and I think the letter x is the most coolest thing there is. anyway, my dad made me chicken nuggets so I can’t finish the thread right now

All jokes aside, I did find a serious thread that was actually useful in detail the reason for X:

The X era has begun at Twitter Elon Musk says the bird logo will be replaced by an X. For Musk, it’s the continuation of a story that began 25 years ago. Here’s a brief history on that...

Let’s go back to 1999. After selling his first company… …Musk set out to disrupt banking. And the X identity was born. But its time as a standalone brand would be short-lived. Instead, X became part of another brand.

X became part of PayPal. And PayPal was acquired by eBay. Musk walked away with $165 million. He then started building again… …and the letter X would reappear.

Musk would launch Space Exploration Technologies Corporation. It would become known as SpaceX.

Worth noting… SpaceX’s logo has a hidden message. The X symbolizes a rocket’s trajectory.

Musk also used “X” at Tesla. It’s the name of Tesla’s third model. Musk envisioned a car lineup with models that spelled out “sexy.” There’s a Model S. An X. And a Y. Ford had the rights to the Model E. So Musk instead opted for 3, which is basically a backwards E.

X is even the name of Musk’s young son.

Meanwhile, you likely saw all of the headlines leading up to this development… …from rebranding Twitter as X Corp… …to the recent launch of xAI. In other words, wheels were already in motion for the new era.

Our Datadog use

Introduction: Summary of previous blogs

Datadog is a monitoring tool my employer purchased licences for, and quickly became the cool thing to use and impress the senior managers with (see Datadog, and Datadog – The Smooth Out).

I discussed problems in both those blogs, but a concern with all metrics is; 

  • What do you want to measure?
  • Who is viewing the data? And when?
  • What does “good” and “bad” look like, and who acts when that state is shown?

In “Datadog Knee Jerk“, I explained how our CTO and Technical Director demanded that everyone create a Datadog dashboard to monitor all services, regardless of what they are.

If we don’t have a clear idea of what to measure, who needs to view it, and how do they know it is good/bad; then aren’t we just throwing money away? (even the dashboard itself doesn’t cost, you still have the time to create one. Some dashboards would require additional logging to be effective though). Surely an obvious problem with wanting to monitor everything is that it can become quite costly when you look into Datadog’s pricing model.

Easy To Make Nonsense Dashboards

From my brief time making Datadog dashboards and analysing other teams’ dashboards, I realised that the data can often look wrong, and it’s really easy to misinterpret the metrics due to the jargon used, and when/how the data is actually collected.

“I know nothing about Datadog, yet have been told to make a dashboard”

Principal Tester

Surely the worst case is to make dashboards that show nonsense data. You will waste time investigating problems that don’t exist, or not be alerted to actual problems that happen. So once we create a dashboard, who checks that it is valid?

Look at this one that I saw:

This is supposed to be plotting a line (purple) for failures in the time frame specified, then another (blue) for “week_before“.

It looks immediately wrong at a single glance. If I have set the time-frame combo box to show the “previous month”, should week_before be last week, or should it be the week before last month? It seemed to be neither. Also, notice that the graph is exactly the same shape/numbers. It just seems to be plotting the exact same data but pretending it is a week later.

Jargon

You would think you just need some understanding of statistics to draw some charts, but in the usual nerd fashion, they throw around jargon to be cool. So people end up saying stuff like this:

What is datadog? Fundamentally, a platform like Datadog provides us with a scalable solution for ingesting observability data from our services. Datadog is built upon the three pillars of observability: 
Metrics provide numerical measurements that allow us to assess our system performance and behaviour
Traces allow us to understand the flow of a request or transaction through our systems
Logs allow us to capture the details of system events and errors

When you read the official documentation, it’s difficult to understand what it actually can do. It’s the combination of jargon plus hyping up features to be powerful:

Datadog vector
Vector is a high-performance observability data pipeline that puts organizations in control of their observability data. Collect, transform, and route all your logs, metrics, and traces to any vendors you want today and any other vendors you may want tomorrow.

Imagine sending your metrics to vendors that you want in the future. They are like “mate, stop spamming us with your info, you aren’t our customer“.

Then you are given the implication that this is the ultimate solution that can somehow solve some of the major problems with our system:

Having access to this data provides us with opportunities to understand the inner workings of our complex and distributed systems in a way that we haven’t been able to before.
However, the data alone is limited in its usefulness, and it is the insights from this data that offer the greater value. Datadog provides the tooling to surface these insights in a way that enables proactive support and improvement of our systems.

DevOps Engineer

The bad thing about overhyping a tool like this is that you have to manage expectations and make it clear what the scope is, otherwise your interactions with managers is more difficult than it should be. One of the DevOps engineers made a vague statement like:

“Our dashboards monitor everything”

So they got a question from a manager “Can you tell me who uses our API?

“no, our dashboards can’t see that”

What we have enabled so far:

  • Configured service metadata to populate service ownership details
  • Enabled traces
  • Enabled RUM (Real User Monitoring) traces to provide full end to end tracing
  • Improved our service & environment tagging
  • Enabled version tracking so that we can observe version related anomalies
  • Defined a baseline set of monitors to cover availability, anomalous throughput, errors, latency and infrastructure performance
  • Defined strict availability & latency SLOs
  • Implemented 24 SLOs & 264 monitors
  • Configured PagerDuty automatic incident creation and resolution
  • Enabled logging
  • Driven several key Information Governance decisions
  • Established a Data asset inventory to provide more objectivity as to what data can be stored in Datadog

Performance Issues

One problem with our system – is performance issues. Although we have blamed all kinds of things, performance issues still remain in general. There’s been claims that Datadog could help us diagnose where the performance issues are, but they have also increased network traffic and server resources; so that they have caused performance issues of their own!

DD agent is using a lot of resources on our test systems and looks to be causing performance issues, I have stopped the agent multiple times when testing as the CPU and memory usage is maxed out. This has been raised before.

Tester
Architect: 
Datadog seems to be showing memory usage on all app servers is high, wonder why?

Me:
Does it only happen when Datadog is watching it?
We licence Datadog to prevent Major Incidents and performance issues…
Datadog causes Major Incidents and performance issues and tells us about it

Another aspect is that some things we wanted to measure required querying our SQL databases. To write an efficient SQL query, the columns you filter on need Indexes to be performant, but Indexes themselves take up space. Then we are always moaning about the cost of storage.

We wanted to look at adding Datadog to monitor the usage of a particular feature that managers were making a lot of fuss about. So we asked the Database Administrators about the repercussions of adding an index to our tables. It soon adds up to be absurd.

I checked a random server and a new Index on RecordID (int 4 byte), Method (tinyint 1 byte) and AvailabilityTimeStamp (datetime 8bytes) would be around 2.5GB for a server. There are 60 servers so we need around 150GB for an extra index across Live. Testing the Datadog query before and after the creation of this index shows a 98.6% improvement in total execution time.

Deployment Engineer
Architect
I wondered if anyone else had noticed (and looking into?) poor performance spikes occurring every 2 hours, they seem to present on most servers I checked.

Me
no one actually looks at Datadog
can you create a Meta Dashboard, so it shows you the time since Dashboards were looked at?

Architect
I can only assume it's genuinely the case that no one actually looks the dashboards
I've raised 4 issues now, purely from observing the trends in the last 2 weeks
we've had wrong servers in the public and private pools
Windows Updates running in the day and killing servers
servers sat idle with no traffic hitting them
SQL Server spikes on F: drive access
these spikes every 2 hours
don't know what they're doing
I've had a look at the Monitoring DB for Server07 this afternoon, and I'm absolutely appalled at how horrendous it is, I can't see the wood for the trees. I can only assume that users aren't getting any work done

Me
Interesting that the spikes are exactly 2h apart, but at different base minutes between servers

Architect
it is interesting, but we're still no closer to anyone paying attention to the issue
Philip will probably sort it, he sorted the last DB-related issue

Datadog pricing

The following are our discounted rates, Per month costs as follows (Sept ‘22 - Sept ‘23):
•Infrastructure $11.75
•Network Performance Monitoring (NPM) $4.30
•Application Performance Monitoring (APM) $29.00
•Custom metrics $5 (per 100, per month)
•High use of logs (>1m/month) $1.52 (per 1m, per 15 days)
•Database Monitoring $77.28 (not discounted)

Standard prices are on here https://www.datadoghq.com/pricing/list/

“Related to this, the Azure Pipelines integration for CI Visibility starting September 1st, 2023 will have a cost of $8 per committer per month (on an annual plan, or $12 per committer per month on-demand). Additionally, 400,000 CI Pipeline Spans are included per Pipeline Visibility committer per month. Based on our June usage data, our monthly cost for Azure Pipelines integration for CI Visibility would have been $644.74. We’ve had this enabled for sometime now, is anybody actively using this?”

CTO
ProductProduct Charges ($)
APM Hosts$2,320.00
Audit Trail$1,846 54
Database Monitoring$463.68
Fargate Tasks (APM)$128.06
Fargate Tasks (Continuous Profiler)$70.84
Fargate Tasks (Infra)$145.73
Infra Host$42,206.00
Log Events – 15 Days$10,265.18
Log Ingestion$28.20
NetFIow Monitoring – 30 Days$507.80
Network Devices$993.14
Network Hosts$1,242.70
Pipeline Visibility$574.08
Profiled Hosts$275.08
RUM Browser or Mobile Sessions$1,186 91
RUM Session Replay$48.74
Sensitive Data Scanner$2,414.65
Serverless APM$48.77
Serverless Workloads (Functions)$1,385 52
Synthetics – API Tests$1825.25
Synthetics – Browser Tests$0.06
Workflow Automation$13.03
Grand Total$67,989.96

These were our monthly charges at one point (although one entry is 15 days, so double it). Then if you estimate how much this costs yearly, it’s going to be $936k unless we either cut down what we are logging, make efficient changes, and don’t scale by adding more servers. So around about $1m just for monitoring for a year. How ridiculous is that!?

Obviously when the CTO was fully aware of these costs, he then calls an All Hands meeting.

CTO: Tells everyone that Datadog should be used by everyone
Also CTO: Help! the costs are spiralling out of control

Jim
Lol - classic.
Jan: We are investing in this awesome new technology.
Apr: Please stop using this technology - it's too expensive.

Me
ha yeah, that was yesterday's meme for me

we've got this amazing feature
but there's not enough licences to use it properly
so we only partially use it
classic

part of my Dev Haiku collection

We even did that in relation to Datadog. It integrated with a product called Pagerduty to notify the team that something is bad, but there’s not enough licences to alert everyone involved! What’s the point even paying for it at all if it is half done. You can’t get the value. It is bonkers.

One of my colleagues who works on a different project to do with an API said it only costs $2,500 to run the API for a month and it’s used by millions of people. Yet here we are spending $78k on monitoring alone.

Service owners and leads have been granted additional permissions to access more detailed billing and account information. Please take some time this next week to:
- Confirm that all active services and features are necessary and being actively used
- Identify any areas where we can optimise our setup to avoid unnecessary costs
- Prioritise production system costs over other environments. This is a critical opportunity for us to do some housekeeping and ensure our resources are being used efficiently. We spend a significant amount of money on Datadog (7 figures), and we need to ensure that we are getting as much bang for our buck!If anyone has any questions about the above, please ask in here or reach out directly!

The Costly INFO

“Please consider very carefully if storing success status logs is required for your applications.”

As predicted, encouraging people to log everything to Datadog, not think about if it is useful, and to not check on it; soon led to an angry message from the CTO.

You know how it was ridiculous we were spending $10k on logs for 15 days? Well, it quickly grew to $15k and counting. On investigation, it was caused by one particular feature who was constantly logging the status INFO or OK.

327.59m OKs were logged in 1 day.

I looked at the total logs across a 1 week period and it showed as 1.18G. There’s a Gagillion logs!

How does it get to this point that no one noticed the excessive logging though? I suppose the Datadog company aren’t gonna alert you to it. They love money.

It proves I was right about getting everyone to create dashboards and increase logging, without actually having an owner to actually check and respond to them.

Costly Switch On

This is a big mea culpa. In an effort to get early visibility in Datadog of AWS, I enabled the AWS Integration (many months ago). This means that lots of metrics, hosts, logs etc come from these accounts automatically (adding to our costs!).

I’d like to undo this mistake but want to understand if anyone is using these monitors (for example NLB monitors).

Any views?

The problem is we pay $11.75/host/month whether or not we use the metrics

CTO
James
Both. You pay per metric and for polling the API.
Neal
Didn't we find this out the hard way with granular replicator CW metric capture? (swiftly removed though)
John
yep, that gets expensive if your pulling from datadog which is why there is the kinesis stream option
James
Yes - we added very granular cloudwatch custom metrics, as we didn't have a direct Datadog connection. This pushed up our AWS spend significantly, so we turned that off. Custom metrics direct in Datadog is significantly cheaper, but still worth keeping an eye on. E.g. we wanted to track latency, error rate and a few others at a per org level - that quickly pushed up to 40K metrics. In DD you pay $5 per month per 200 custom metrics.  So we had to change our approach to metrics / observability to only surface details for error situations.
CTO
I've disabled the metric collection for all these accounts now. That saves at least $6,800 per year. Every little helps!

Who is viewing the data? And when?

Next challenge is this: I want the rest of the Senior Leadership Team (who are mostly not as technical as me) to be able to connect to DataDog and be able to understand how well our systems are working. I would suggest using the Golden Signals as a standard for all the systems. The dashboard that’s created needs to be easily consumed, consistent across products and reflective of customers’ experience. Can we make this happen?

CTO
Me:
Are the Directors actually gonna connect to DataDog? Trying to picture how this can be useful, and how they should get the info.
Architect
it's all laughable really
I was cringing when he put in the initial request to move it to single-sign on, could see where it was heading!
I don't think they need access to Datadog, surely they just want:
-Everything is ok
-Some things are broken (these things...)
-Everything is in ruins

everything else requires some level of investigation and interpretation
and we should probably have that information on our business status page, unless we're scared about how frequently we have problems

Me
That's true. People probably already tell the CEO if things aren't fine
and if they want to fiddle the figures, they can still do it in the dashboard that she sees

yep that dashboard was a first attempt to get us a first step to what you describe
problem with us at the deep technical level – is knowing what data SLT / GXT find useful. Count of Active Alerts ? Just a simple RAG status ?
average response time ?

DevOps Engineer

The conclusion was, the metrics that we create should have the following properties:

  • Consistent – the same interpretation should be made across all products
  • Comparative – we can tell from the metric how close we are to having an issue (ie percentage of link utilisation is better than Mbps)
  • Trending – we can see the past and if there is an underlying trend that points to an issue in the future the metric would make that obvious
  • RAG status’d – you can tell if a metric is ok quickly by seeing if it’s red, amber or green.
  • Relatable – the metric is connected to the experience by a customer, partner or patient.

We Monitored Everything But Not That Important Bit

Hi all, just letting you know about a lesson learnt from last weekend’s Mobile disruption incident; An internal service Mobile relies on had a dependency on a 3rd party service endpoint that went into an error state. Unfortunately we weren’t monitoring that as part of the service monitoring and therefore we had a significant delay in detecting that down-stream failure. We also didn’t have the Mobile service mapped as depending on the other internal service within PagerDuty, so even if an alert had fired, we wouldn’t have seen that incident bubble up from Identify Link into Mobile as the cause of the Mobile disruption. 

CTO

Conclusion

I say in a lot of these blogs that you really need to understand the problem you are trying to solve. Otherwise you end up wasting time, money and causing more problems. It’s ridiculous that we have spent $1m a year on monitoring, and we can’t predict or react to Major Incidents. There’s gaps in the monitoring, incidents caused by the monitoring, and people not looking at the monitoring.

Also in predictable fashion, we are moving away from Datadog to Dynatrace which is supposed to be much cheaper. However, all the dashboards will have to be remade so there’s going to be lots of time wasted.

CTO overrule

I’ve written blogs about how our CTO tried to change our release process and announced it on a “Town Hall” call with the entire department; then loads of teams told him it couldn’t be done, so he had to back down.

Then later, on another Town Hall, he tried to change the Change Control process, but wouldn’t back down when we told him it wouldn’t work. He made the claim of it being a sackable offence if we didn’t follow it. Then a month later, he found out someone couldn’t turn a server on because of the Change Control process. He said it was “malicious compliance” and that will be a sackable offence in future. Within a few months, nearly the entire process had been reverted.

Last week, he was talking about how we needed to move a Data Centre to a new location. He said he preferred to move to the Cloud since it is “inline with our strategic targets”. However, after having several meetings with the experts involved in the Data Centre, they decided the best solution would be to move to another data centre. However, the CTO didn’t like this because it wasn’t inline with their strategy and he thought the move would be too slow.

Therefore, he took the executive decision to overrule them and demand they move to the cloud.

“Do I know we can move all servers to the cloud in time? No. But I was prepared to take the risk. I would rather make decisions and be wrong, than sit by and not make any”

CTO

It seemed strange to me to claim that moving to a physical data centre would be slow, but then moving to the Cloud probably couldn’t be done in time.

He then claimed that

“we have wasted enough time deciding what the plan should be; to move to the cloud or to move to a physical data centre”.

CTO

Isn’t this the worst scenario though? He could have made the decision before any meeting was arranged. But it sounds like they had debated the decision, came to a conclusion, then he told them he didn’t like their conclusion. Then he moaned that they wasted time debating.

So they have had meetings with the experts, and conclude the data centre is the best decision, but since the CTO loves the cloud, he has overruled them. So what was the value of the meeting? And will the staff be motivated to do something they don’t believe in?

“I think I’m compassionate with my team. It’s what binds us together as a team. Otherwise we are a bunch of individuals.”

CTO

I don’t get how he can make these statements and not realise the hypocrisy. How can you be compassionate if you have shown no confidence in their opinions and decision making?

Time

When it comes to software, the concept of time can cause problems. There’s actually some really interesting scenarios, but even in simple applications, some developers really struggle with simple concepts.

In terms of standard problems, you can have problems where the client and server times can be out. This can just be because they are set incorrectly, or maybe are using a different timezone. As a developer, if you are looking at times in log files across the client and server, it can cause confusion if the timestamps are out. A common thing I have seen is that some servers don’t use Daylight Savings Time we have in the UK, but the client times often do. So the server can be an hour out.

Daylight savings time is interesting as time shifts forward or backwards one hour. So time isn’t linear.

I recall reading a blog about time by Jon Skeet who then discussed how if you are using historical dates, the time can also suddenly change. Like if a country switches to a different calendar system entirely, so moving a day could suddenly jump in years to align with the new system.
Computerphile have a discussion on this The Problem with Time & Timezones – Computerphile

Leap Years

We once had a leap year bug because someone created a new date using the current day and month, and added a year. So when it was 29th Feb, it tried to create a date of 29th Feb for next year which wasn’t a valid date. So the feature crashed. Everyone was panicking trying to rush out a fix, but then we realised we could only get the fix out to our customers tomorrow, and the bug wouldn’t happen. Not for another 4 years anyway. It was hilarious

-1

One weird mistake I saw recently, is that a developer defined a variable and set it to 5. The code they wrote was supposed to make sure that we never make an API call more than once every 5 minutes. However, they then minused 1, so were checking every 4 minutes instead.

var minimumNumberOfSecondsRequiredBetweenEachAPICall = (NumberOfMinutesRequiredBetweenEachAPICall - 1) * 60;

Ages

You would think everyone would understand the concept of ages since everyone has an age and it increases by 1 every time you have a birthday. However, many developers seem to struggle with the concept. The last implementation I saw had the following:

int age = DateTime.Now.Year - dateOfBirth.Year;

So it can be one year out because it basically assumes your birthday is 1st January.

It reminds me of an exchange on Twitter that I saw years ago. It was in the context of football.

PA: Why are Arsenal paying £25m for a 29 year old striker?
G: he’s 28 btw
PA: He’s a lot nearer to 29 than 28, that’s a fact
G: He’s 28, that’s a fact
PA: Why am I not surprised that fractions are beyond you. The day after his birthday, he is no longer 28.
G: He’s 28 until he becomes 29. That’s how it works
PA: Perhaps if you had paid more attention in Maths lessons? You might remember “round up or down to the nearest whole number”
G: He’s 28. That’s a fact.
PA: No, it is not. £1.75 is not one pound. You don’t even understand what a fact is now.
G: Until he is 29, he is 28.

When it is the next day after your birth, are you 1 day old? technically you could just be a minute old but claim you are 1 day old.

My instinct to perform mathematics on dates would be to use an existing date library. Another developer tried to make something themselves. This seemed a bit complex to me, but I think it actually worked, or at least seemed reasonable for how they wanted to use it.


public static double AgeInYearsAtDate(DateTime effectiveDate, DateTime dateOfBirth)
{
        double daysInYear = 365.25;
        int completeYears = Age.GetYears(dateOfBirth, effectiveDate);

        dateOfBirth = dateOfBirth.AddYears(completeYears);

        double proportion = effectiveDate == dateOfBirth ? 0 : Age.GetDays(dateOfBirth, effectiveDate) / daysInYear;

        return completeYears + proportion;
        }

        public static string ConvertCurrentAgeToYearsAndMonths(double age)
        {
                int monthsInYear = 12;
                int years = (int)age;
                int months = (int)Math.Round((age - (int)age) * monthsInYear);

        return $"{years} year{(years == 1 ? String.Empty : "s")} and {months} month{(months == 1 ? String.Empty : "s")}";
        }

Ages Part 2

Another developer was testing his age code and wrote this:

new object[]
            {
                new DateTime(2010, 05, 31),
                new DateTime(2009, 06, 01),
                AgeRange.UpToOneYear,
                "52 weeks and 0 days"
            },

If there’s 52 weeks in a year, then is that 52 weeks? kinda looks 1 day short to me. Time is mental isn’t it?

Hidden Scanning Portal

Many years ago, my colleague Andy came up with a great software hack to fix a bug. I didn’t understand the fix at the time, so don’t remember the details, but the bug manifested as a red box replacing a UI control whilst the user was scanning a paper document.

Andy implemented a solution dubbed the “Hidden Scanning Portal,” a dialog box that remained invisible until the scan was complete, after which it was disposed of.

After a few months, another developer, Joe, convinced they had discovered a more permanent solution; removed Andy’s Hidden Scanning Portal. This action inadvertently introduced a new bug, so the Hidden Scanning Portal was swiftly restored, averting further complications.

Our Team Lead, Matt revealed that failing to fix the original issue could have resulted in a fine of £16,000. This revelation cast Andy’s quick fix in a new light, attributing a significant value to what might have otherwise been seen as a mere temporary solution. 

Andy’s reaction to the situation was a mix of pride and frustration. Despite his contribution to saving the company from a hefty fine, he lamented the lack of recognition in the form of a modest pay rise.

“and they didn’t even give me a measly 3% pay rise”

Andy

Quick fixes might not be ideal, and increase “technical debt”, but they can provide immediate relief, avoid hefty fines, and great stories to reminisce about.

Crazy Code Design

Here is a selection of poor code design that I have come across over the years.

Opacity Slider

Sometimes, programmers write simple programs to help them, or their team. When I first joined this company, they were using a more basic version of a source control like SVN, and someone had designed a program to help with Code Reviews. On the tool, they had a slider that changes the opacity of the dialog, so you can make it see-through. It seemed like a case of the developer simply doing it because they could, and not because anyone actually found the feature useful. I suppose you could be more creative if it’s only used internally, and not for a customer; but still incredibly pointless.

Law of Demeter

Law of Demeter, also known as the principle of least knowledge, advocates for a design where objects are loosely coupled and only communicate with their immediate acquaintances. This means that a method in a class should only call methods on:

  • Its direct components.
  • The object itself.
  • Objects passed as parameters.
  • Any objects it creates.

In practice, this means you shouldn’t have a massive chain of methods/properties.

One example of breaking this law that I saw in our code looked like this:

responseMessage.acknowledgements.conveyingTransmission.controlActEvent.reason[0].detectedEvent.code.code

Awful.

Partial Record

One of our modules can potentially show a large amount of data. The data can come from the local record, or if there is a sharing agreement, it can come from other companies that they share with. There are certain screens where you don’t need all the information, so to try and cut down loading times, we have this concept of a Partial record and a Full Record.

	public IRecordPartial Partial
	{
		get
		{
			return IsFullLoaded ? Local : EnsurePartial();
		}
	}

The logic, and wording gets confusing real fast. The property above is called Partial, but there is a check for IsFullLoaded which implies that Local can be Full, and not just Partial like the property says. When you look further into the code, Local might not even be local because it can contain shared. Mindbending.

PercentNull

Coming up with descriptive names is always a challenge in programming, and there is occasions where naming can be ambiguous. However, I have no idea what a method called PercentNull does here:

transactionOut.Surname = helperFacade.PercentNull(customerDetail.Surname);

If it assigning the result to Surname, then it should be returning text. Nothing immediately obvious comes to mind if you are passing in Surname to a method called PercentNull and getting a Surname from that. So it’s not like it is returning a percent number if it can, or Null if it cannot. Or returning the percentage that the text contains whitespace. 🤷

How High

	public enum SeverityScale
	{
		None = 0,
		Low = 1,
		Medium = 2,
		High = 3,
		ExtraHigh = 5,
		VeryHigh = 6
	}

We have this enum to represent the Severity. It makes sense until you get further than High. Extra High makes sense to be more severe than High, but should Very High be higher than Extra High? Extra and Very sound like synonyms. You need something more extreme like Extremely High to make it immediately obvious what the ranking is.

Focussed or not?

	public void SetSearchBox()
	{
		SearchFocused = false;
		SearchFocused = true;
	}

When you see code like the above, it is most likely that the code is doing something weird, and because no one worked out how to fix the original issue, you then end up writing more weird code to work around that. These “hacks” just stack up and are hard to remove. If you try to do the right/honourable thing and remove a hack, then you will probably see some strange behaviour which then means you remove more code, and maybe more code when you find more problems. Repeat until you are left with the original issue which you then have to fix.

So how can setting a property to false, then immediately setting it to true actually do something? Probably if the property has loads of logic and side-effects. Probably best not to look. 🙈

Random Parameter

		public Organisation(bool randomParamToChangeSignatureForFasterConstruction)
		{
			// Had to create a new constructor that doesn't initialise internal state, param isn't used.
		}

The thing that is scary about this strange code, is that it was written by one of our smartest developers. I have no idea why he had to create an extra constructor. Presumably the parameter is there because the “default” constructor already existed. Intentionally not initialising data sounds like a recipe for bugs though. Maybe it needs a redesign.

Slow Request

I remember my Dad telling me a story of a software team putting a delay into their code. Then each month, they would simply reduce the delay a bit and tell their managers they have been working hard on performance improvements.

if (Helper.SlowRequest)
	Thread.Sleep(15000);

I found the above code in our codebase but it relies on a configuration value being present in the config file. It was present and set to true by default for Developers though so would always be slow. Changing the value to false would speed it up, but you have to know that it exists to change it.

<add key="SlowRequest" value="true"/>

Although it doesn’t affect our customers, there’s always the chance something will go wrong one day and it could affect them.

Boolean “Mode”

If you want to handle many options, you often use an “enum”. Using a “Boolean” which represents 2 values (true/false) is a very weird choice…

 <param name="mode">If set to <c>true</c> then [mode] is a delete operation, else add, edit and none operations</param>

so

true = delete

false = add OR edit OR none.

If you put the wrong value, then there’s an if statement after. So if you say it’s a delete and it isn’t, then things don’t get deleted.

		if(!mode)
		{
			if (@event.updateMode != vocUpdateMode.delete)
			{
			}
			else
			{
				if (@event.updateMode == vocUpdateMode.delete)

Not Supported Property

	public virtual String UploadedBy
	{
		get { return _document.Observation.EnteredByUser.DisplayName; }
		set { throw new NotSupportedException("Setting UploadedBy is not supported"); }
	}

Isn’t that Set a complete dick-move. If you call it, it will crash. If the setter wasn’t there at all, you would know it shouldn’t be set.

I guess it could be designed with the expectation that you would override the property. However, I thought it wouldn’t be set correctly, because the “get” returns a massive chain of properties. So it’s not just the case of setting a variable, it’s actually document.Observation.EnteredByUser.DisplayName that needs to be set, and that is breaking the Law of Demeter anyway.

Mutual Exclusive

This is gonna end in tears

private Boolean _isSingleClick;
private Boolean _isDoubleClicked;

Not only are there better ways than detecting clicks, when you have multiple variables tracking similar concepts like this, you can easily end up in invalid states. If _isDoubleClicked is true, then you would expect _isSingleClick to always be false. But it is easy to make a mistake in the code and not set it which then leads to a bug.

Notifications

		public bool IsNotificationConfigEmailEnabled()
		{
			if (!_configSmsEnabled.HasValue)
				_configSmsEnabled = NotificationConfiguration.IsEmailEnabled;

			return _configSmsEnabled.Value;
		}

The fact that this code had been there years means it should work. But when the property is supposed to be checking if Email is enabled but the code only looks at SMS enabled; then who knows how it works.

Resizing

int parentSize = Parent != null
                ? Parent.Width
                : Screen.PrimaryScreen.Bounds.Width;
 
var availableSpace = parentSize - _clientLocation.Y - _yOffset;

What a nonsense calculation that is! We want to calculate the height of a pop up box and make sure it can fit within the window. So we look at the width of control, or maybe the width of the monitor that they might not be using (if they actually have the program on their secondary monitor), then minus the Y coordinate of the window which would be related to height, and not width.

Splitting

Sometimes programmers like jamming as much code as they can on what is technically a single line. It is an absolute nightmare to debug chained logic like this:

return
	RequiresSingleItemOrder(item, customerRequestsSplit)
	?
	null
	:
	batchOrders
	.FirstOrDefault(
		orderInstance =>
			(orderInstance.IssueMethod != IssueMethod.Electronic || orderInstance.Items.Count() < 4)
			&&
			!orderInstance.Items.Any(m => RequiresSingleItemOrder(m, customerRequestsSplit))
			&&
			orderInstance.IssueMethod == issueMethod
			&&
			orderInstance.OrderType == OrderType
			&&
			orderInstance.IsUrgent == isUrgent
			&&
			(!ShouldSeparateControlledItems
			||
			orderInstance.AnyControlledItems == item.IsControlledItem)
			&&
			orderInstance.Warehouse == Warehouse
			&&
			!Authorisation.Separator.ShouldBeSeparated(authoriser: authoriser, orderAuthoriser: orderInstance.Authoriser)
			&&
			(!ShouldSeparatePrivateItems
			||
			orderInstance.IsPrivate == isPrivate)
			&&
			MatchesForRepeatOrdering(item, orderInstance)
			&&
			NullAndNonsequentialEqual(separationTypeIds, orderInstance.SeparationTypeIds));

Funny Youtube Comment

I saw this comment on a programming video on YouTube. It is remarking on how you can write really confusing code as a way to increase your job security because you can be in charge of code that only you can read:

And remember, kids – if you nest multiple null coalescing operators into a single line of poly-nested ternary operators, that’s called “job security” – cause ain’t no one wanna maintain that code when you’re gone.

return a> b ? a < c ? a != d ? e ?? f ?? 0 : f ?? g ?? : 0 e ?? g ?? 0;

Typescript Example

Years ago, we started rewriting our C# program using Web Technologies. Since everyone was new to Javascript and Typescript, everyone wrote awful code. But then you didn’t know if it was good or bad. One of the first bits of code I saw when I joined their team was this:

export const Actions: { [key: string]: <T extends {}> (value: T) => IAction<T> } = {
 setModuleName: <T extends {}>(value: T): IAction<T> => CreateAction<T>(ActionTypes.setModuleName, value),
};

Don’t ask me what it says, because I have no idea.

Contradiction

InsertUpdateResourceImmutableProperties

It’s so frustrating to read, it makes you want to punch someone. Immutability means it shouldn’t change after it has been created. So how can you Update something Immutable? Luckily it has a comment explaining it:

   -- If we are about to insert the most upto date version of the resource, then we should update the
    -- resources immutable properties, because they might have changed in the source system. (Even though they shouldn't).

Well, I am still none-the-wiser.

Boolean Extensions

If you have a boolean variable and you want to check it is false, you can just write:

			bool example = false;

			if (example == false)
			{
			}

Now with this handy extension method:

public static class BooleanExtensions
{
	public static bool IsFalse(this bool boolValue)
	{
		return !boolValue;
	}
}

You can now write:

bool example = false;

if (example.IsFalse())
{
}

What’s the point in that? No advantage really is there?

Not Set Up

		get
		{
			if (_setup)
			{
				_setup = false;
				return true;
			}

			return _noDocumentsInError;
		}

In the get property, we check the value of _setup. If true then we set it to false, and the overall result returns true. However, if we immediately call the same property, it will just return the value of _noDocumentsInError instead. It’s bad design to have side-effects in a get. Gets are supposed to return a value, and not set things. Then we seem to have different fields tracking different concepts which just looks like it will be prone to errors.

Reload

	public void ReloadTriggers()
		{
			// To be implemented
			return;
		}

This code doesn’t even need a return statement. It is just there for the bantz. When this code is in a module that is very error prone, and you have a method that is not implemented and does nothing at all, then doesn’t give a good impression does it?

Dialog Events

            _msgBox.ShowDialog();
        	return _continueWithSelection ? DialogResult.OK : DialogResult.Cancel;
}
 
private void Cancel_Selection(object sender, EventArgs e)
{
       _continueWithSelection = false;
       _msgBox.Dispose();
}

You can get the dialog result when the dialog is closing. However, this developer has decided to create their own event handlers for the button clicks, then using the variable to decide if it is a DialogResult.OK or Cancel.

NoEncryptionEncryptor

new NoEncryptionEncryptor() 

Is this an Encryptor or not?🤔

SillyException

You throw an Exception when something bad has happened. However, in our code there is a concept of SillyException where you are supposed to ignore it because apparently, it isn’t a real error. It’s only used in a specific part of our codebase though so goes undiscovered for ages. A colleague Michael found it again and we were reminiscing on a bug I found.

Michael 11:33: 
if (mapped is SillyException) return 0;         // Avoid throwing the SillyException

Me 11:33:
I love sillyexception

Michael 11:33:
/// <summary>Represents a data provider exception that doesn't indicate an actual error,
/// and should be ignored.</summary>
[Serializable]
public class SillyException

Me 11:34:
I remember David going on about SillyExceptions and it took me ages to realise he was referring to an actual thing

Michael 11:35:
"I keep getting this SillyException"
what kind of exception is it David?
IT'S A SILLYEXCEPTION
YES, BUT WAT KIND

Me 11:35:
yeah, it did go something like that

Michael 11:35:
haha

Me 11:37:
I think it was around the time he spent weeks investigating that bug I found that no one else could recreate
and I thought he had worked it out when he said he was getting a SillyException

 lame async call

var result = client.BeginUploadFile((Stream)payload, endpoint);
bool uploadCompleted = false;
while (!uploadCompleted)
{
	uploadCompleted = true;
	if (!result.IsCompleted)
	{
		uploadCompleted = false;
	}
}
if (!uploadCompleted)
{
	throw new Exception($"Failure occurred while uploading the data");
}
client.EndUploadFile(result);

weird that they set it to true, then set it back to false. Then it will never get to the exception, it will just be an infinite loop because the check is outside the loop that will only exit if it is complete.

Is this just a reimplementation of a lame async call?

Email Regex

When it comes to data entry, you can perform validation using Regular Expressions which defines a pattern/structure. Certain identifiers have rules, for example; an English postcode (I think) is 2 letters, 1 or 2 numbers, usually formatted with a space, then a number, followed by 2 letters. eg  HX1 1QG.

Software developers often come up with convoluted rules to try and define what an email address looks like, but I think they are inconsistent even between email providers that certain special characters can be excluded.

In our software, a developer had decided to change the regular expression in our email address validation. It was complex before, and it looked more complex after the new developer’s changes.

This was what it was:

return (Regex.IsMatch(emailAddress, @"^[A-Za-z0-9_\.\-&\+~]+@((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,6}$"));

And he changed it to this:

return (Regex.IsMatch(emailAddress, @"^([\w-]+(?:\.[\w-]+)*(?:\'[\w-]+)*)@((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$"));

Now, glancing at that, I have no idea what the advantage is. Some developers just assumed it was a good change and approved the change. What I noticed is that the end part was changed from [a-zA-Z] to [a-z], which means the last part of the email eg “.com” can only be written in lowercase, but previously we accepted “.COM“. As far as I am aware, email addresses don’t care about casing, so although it would be nice if users only entered them in lowercase, it seems an unnecessary restriction. Was it intentional? What else in this validation is different?

If he had unit tests to show the valid emails, and prove his change wasn’t flagging valid emails as invalid; then I would be more comfortable approving it. But there were no unit tests, and I didn’t understand the change.

Another developer pointed out that the advice for emails is usually just to go for the simple *@*.* validation which translates to; “it must have  some characters, followed by an @ sign, more characters, then a period followed by some characters“.

Email validation is surprisingly complicated, so it’s best having a simple restriction just to filter out data-entry errors. Having complex restrictions can then exclude people’s valid email addresses and you definitely don’t want to do that!

The Feature Flag Rant

When adding new features to software, you can add a Feature Flag. If set to true, it uses the new feature, false and it doesn’t. This allows a quick roll-back feature by tweaking this value rather than releasing a new software update. However, it makes the code more complicated due to branching paths.

When all users are now using the new feature, when do you remove the code? Obviously it should be removed once all users are switched over and happy with the new functionality, but the work needs to be planned in, and what is the urgency? Project Managers will want new projects that add value, not deleting redundant code.

One of our most experienced developers posted a rant about feature flags. He pointed out there was no guidance on when to use feature flags. Do all new features get feature flags? What if it depends on a feature that already has a feature flag? Do Software Testers test each combination to make sure all code paths are supported? Is it clear which configurations are deployed on live since this should have priority when it comes to testing? By default, our Test Environments should match the config of a typical Live Environment. However, we often find that the default is some configuration that is invalid/not used.

It’s not always possible to “roll back” by switching the feature flag off. This is because to implement the change, you may have needed to refactor the code, or add new database columns. Changing the feature flag back to “off/false” just stops some new code being called, but not all new code changes (the refactored parts). So if the bug is with the changes even with the flag off; then it is still a problem.

It was also discussed that some people used our Configuration Tool for actual configuration and others were using them as Feature flags, and maybe we should have separate tools for Configuration and Features.

Feature flags cause maintenance problems. It needs to be tested on/off when implemented, then if you want to remove it, then that needs to be tested too. If you leave it in, then it’s always going to be questioned if code in that area is used/needs testing. How do you prioritise removing the code; does it belong with the team that initially created the feature? What if the team has moved on, or split?

Another developer brought up an example of how a bug existed in two places but the developer that fixed the issue was only aware of one path, and didn’t know about the other which required a feature flag to enable.

He also questioned if it is more of a problem with our process. Other companies may have quicker releases and are more flexible to rollback using ideas like Canary Deployment. Our process is slow and relies on “fix-forward” rather than rollback.

Things to consider:

  • What actually gets feature flagged?
  • When is the conditional code is removed from the codebase
  • Effect of the “Cartesian Explosion” of combination of flags on unit tests and test environments

SonarCloud, the Static Analysis Tool

SonarCloud is a static analysis tool. It runs on your codebase and points out where code can be written to “best practices”, possible bugs, and code coverage.

When my employer bought a SonarCloud licence, there was a large emphasis that they wanted existing errors fixed, and no errors in new code. I knew this would be a bad idea to emphasise because of the classic “you get what you measure”. If you tell people the goal is to not have errors, then they will do whatever they can to not have errors. The idea of a tool like SonarCloud is to improve code quality, but since the metric is a number, then the number becomes the goal rather than code quality. However, the aim should be to improve the quality of the code, optimised for readability, maintainability, scalability, and code coverage.

As more people began working with SonarCloud, we saw more changes with the title along the lines of “Fix Sonarcloud issues“.

Human Analysis

SonarCloud is just a static analysis tool, it can only spot problems and suggest pre-defined improvements. These improvements are not always in the best interest of the code base. You have to objectively ask yourself “Does the change I’m making make the code better or worse?” Before you can do that, you need to understand why Sonar thinks there is a problem with the code. Then you can decide if Sonar is right or not. Some of the rules are not applicable everywhere.

I have seen a number of changes where the code is actually made worse by the changes to satisfy Sonar.

Example 1: 7 Argument Limit

A constructor or method that contains loads of parameters/arguments can be a sign of bad design, and maybe some of the parameters can be grouped together inside an object. Once you reach 8 arguments, Sonar will flag this. A simple fix is just to create a class and throw in a couple of parameters in there. It then satisfies the rule but it doesn’t make logical sense unless the parameters are in fact related. Adding a new class when it is single use could just make the codebase more cluttered and seemingly more complicated. I would rather see a class with 8 values, than some complicated class with extra types in the way “Just because of Sonar”. Mark Seemann has a good blog post about Curbing code rot with thresholds.

Example 2: TryParse

Another example, I once wrote some code with the following:

var ok = int.TryParse(a, out var x) & int.TryParse(b, out var y);
//something with x and y

Sonar complained about the use of the bitwise & and it was confusing and suggested I use &&. However, if I did that then “y” wouldn’t always be defined because of the short-circuiting and the code wouldn’t compile. I was about to reject the Sonar issue as “Suggests code that doesn’t compile” and just keep my version.

Then I thought, “if Sonar cannot understand my code to make a decent suggestion, maybe other developers can’t either“. I was trying to be too clever with my code.

Instead I changed the code to:

var ok1 = int.TryParse(a, out var x);
var ok2 = int.TryParse(b, out var y);
var ok = ok1 && ok2;
//something with x and y

It wasn’t as terse as my original version, but it was certainly easier to read and understand, and Sonar didn’t have a problem with it any more either.

Example 3: Cyclomatic Complexity

When a method has loads of If statements, this creates loads of permutations that can be executed which means if you are aiming for true 100% test coverage, you have loads of tests to write. It can easily make the code hard to read and understand too. At a certain point, Sonar suggests breaking the methods into smaller methods. I have seen people take this extremely literally and you end up with a design that looks like

  SetPersonalDetailDifferences1(returnList);
  SetPersonalDetailDifferences2(region, returnList);


So there is no logical grouping for what goes in part 1 and part 2, it’s just that enough code gets placed in the first method then everything else in the second. Now the original single method is half the size with no logical reason other than to satisfy the Sonar rule.

Me (rhetorically)
Are these fields logically grouped or is this just to satisfy sonar?

Brijesh
It is just to satisfy sonar

Example 4: Making The Code Worse

Nullable DateTime

DateTime always has to have a value. However, You can declare a nullable DateTime in C# by appending a question mark like DateTime?

The existing code was checking a standard DateTime for null, which can never happen.

if (startDate == null)
   throw new ArgumentNullException("startDate");

The Code Analysis report was correctly flagging this code as completely unnecessary. Instead of removing the code, the developer then changed it to

if ((DateTime?)startDate == null)
    throw new ArgumentNullException("startDate");

The method was still accepting the startDate as a non-nullable DateTime so could never be null. But then it was being cast to a nullable DateTime so the check against null is technically valid.

Me:
Why are you casting to nullable datetime?

Chandeshwar
dateTime is value type , it can never be null value.
if check dateTime == null, it's always return false .

Me:
Yes, that is correct Chandeshwar. That’s why you can delete the code completely. Your code is always false too.

Sometimes, this type of scenario leads to multiple attempts. Either

  1. because they make some changes and they still get the same Sonar error,
  2. they introduce a new problem,
  3. or maybe the reviewer dislikes the changes and wants them to change it back.

So there’s plenty of files where our Change History looks like this

Fix Code Smell Issue <- doing what they think Sonar wanted them to do
Additional Static Analysis Fixes <-they messed up, so tried again
Addressing code review comments <- I pointed out it still wasn’t correct

Example: Magic Numbers

Magic numbers should not be used

A magic number is a number that comes out of nowhere, and is directly used in a statement. Magic numbers are often used, for instance to limit the number of iterations of a loop, to test the value of a property, etc.

Using magic numbers may seem obvious and straightforward when you’re writing a piece of code, but they are much less obvious and straightforward at debugging time

That is why magic numbers must be demystified by first being assigned to clearly named variables before being used.

-1, 0 and 1 are not considered magic numbers.

Vignesh changed code like

dosagesCount == 1

to

dosagesCount == Constants.Single

Me:
Constants are for a different purpose! They are not for replacing all the numbers in the codebase. It's a good idea to give numbers clear names, and if the same value is used multiple times, it means if you need to change the number, it will update in all places.

Vignesh
This is the rule we follow (quotes the magic number description shown above)... and I got comments from my senior level.

WHAT DOES IT SAY, VIGNESH?
“-1, 0 and 1 are not considered magic numbers”

And you have replaced 1 with the word “SINGLE”

Example: Lack of pragmatism

It frustrates me so much that so many developers even don’t agree with Sonar, or don’t understand the change, but still attempt to change it anyway. Their sole goal is to remove the error that they lose sight of the true aim to write great code that works.

Dave
OK, but why did it suggest it, and why does it make the code better? Did you understand the suggestion or just blindly do what it said?

Pavel
I think I understood it. The comment was: "Add the default parameter value defined in the overridden method". Default arguments are determined by the static type of the object. If a default argument is different for a parameter in an overriding method, the value used in the call will be different when calls are made via the base or derived object, which may be contrary to developer expectations.
That's why it suggested it. But in my opinion this change is redundant and doesn't make the code better.

There was some code which had loads of if statements. It was checking the types of “nodes” in a tree structure, and so attempted to cast the type. If successful, it would go into that code block, else it would attempt to cast to a different type.

Even though the developer didn’t need to change this code, he did change it to attempt to resolve Sonar issues with regards to the casting. However, he only updated some lines and not others which meant it was inconsistent so the codebase is more confusing, and still contains Sonar errors. He also took some lines out of the if statements and then performed all the casting at the top of the method. So now it was actually more inefficient because once you have found a match, you do not need to keep attempting to cast.

  {
            var creterionTag = row.Tag as LinkedCriterion;
            var relationshipTag = row.Tag as Relationship;
            var attributeTag = row.Tag as Attribute;
            var resultSetRuleTag = row.Tag as ResultSetRule;
            var conceptOrderingTag = row.Tag as ConceptOrdering;
Me
why change to "as casts" for most, but then not for Table, Concept and SharingDisplayValue?
I'm not even sure if there is an advantage to doing it this way. We now have several variables where only 1 will be set, and the rest are null.
Might be better spending more time refactoring it out to get rid of the branching.
Pattern matching is probably the neatest way for now. https://docs.microsoft.com/en-us/dotnet/csharp/pattern-matching

Kalya
I just simply resolved some existing SonarQube issue which not raised because of our changes. It is a kind of help to resolving existing issues, It is very difficult to resolving all the issues as of now

So he tried to help, was too difficult, so gave up, but still decided to submit the changes for review, despite making the code worse.

Example: Just Suppress it!

Of course, you can still get rid of the error without actually fixing anything. But merely hide it from the managers that are only looking at the figures:

#pragma warning disable S1075 // URIs should not be hardcoded
public const string InfoUrl = " http://med.info";
 #pragma warning restore S1075 // URIs should not be hardcoded

Conclusion

The goal isn’t to make Sonar happy, the goal is to write good clean code, sonar is a guide to help you do that, but it doesn’t guarantee success.

Me