The Feature Flag Rant

When adding new features to software, you can add a Feature Flag. If set to true, it uses the new feature, false and it doesn’t. This allows a quick roll-back feature by tweaking this value rather than releasing a new software update. However, it makes the code more complicated due to branching paths.

When all users are now using the new feature, when do you remove the code? Obviously it should be removed once all users are switched over and happy with the new functionality, but the work needs to be planned in, and what is the urgency? Project Managers will want new projects that add value, not deleting redundant code.

One of our most experienced developers posted a rant about feature flags. He pointed out there was no guidance on when to use feature flags. Do all new features get feature flags? What if it depends on a feature that already has a feature flag? Do Software Testers test each combination to make sure all code paths are supported? Is it clear which configurations are deployed on live since this should have priority when it comes to testing? By default, our Test Environments should match the config of a typical Live Environment. However, we often find that the default is some configuration that is invalid/not used.

It’s not always possible to “roll back” by switching the feature flag off. This is because to implement the change, you may have needed to refactor the code, or add new database columns. Changing the feature flag back to “off/false” just stops some new code being called, but not all new code changes (the refactored parts). So if the bug is with the changes even with the flag off; then it is still a problem.

It was also discussed that some people used our Configuration Tool for actual configuration and others were using them as Feature flags, and maybe we should have separate tools for Configuration and Features.

Feature flags cause maintenance problems. It needs to be tested on/off when implemented, then if you want to remove it, then that needs to be tested too. If you leave it in, then it’s always going to be questioned if code in that area is used/needs testing. How do you prioritise removing the code; does it belong with the team that initially created the feature? What if the team has moved on, or split?

Another developer brought up an example of how a bug existed in two places but the developer that fixed the issue was only aware of one path, and didn’t know about the other which required a feature flag to enable.

He also questioned if it is more of a problem with our process. Other companies may have quicker releases and are more flexible to rollback using ideas like Canary Deployment. Our process is slow and relies on “fix-forward” rather than rollback.

Things to consider:

  • What actually gets feature flagged?
  • When is the conditional code is removed from the codebase
  • Effect of the “Cartesian Explosion” of combination of flags on unit tests and test environments

Code Comments

When writing code, some developers like to write notes to themselves/others in the actual code using “comments”. The intent is for documentation; to explain what the code is doing. It is often argued that code should be “self-describing” which is an idea I agree with. However, there can be complex functionality because it is technical, the developer struggled to come up with a simple version, or maybe the domain logic is just that way.

I’ve collated various examples, with various degrees of humour.

Confusing

Comments are supposed to add clarity to the code. When you see code that looks simple, but the comment seems to describe something else, is ambiguous, or misleading, then the comment has decreased clarity.

// Display save dialog
RefreshFilters();
...
// Save over existing filter
RefreshFilters(); 

You would think RefreshFilters would simply reload the filters. It sounds like it is prompting the user with a save dialog, and even overwrites the existing filter.

Self-deprecating

//Well I've messed this up!!!
//Ultimately a filter ends up adding something to the where clause,
//  or if it's a relationships filter, then the AND clause of the join.
//The way I'm adding it to these is not clean and just sucks.  (two constructors with lots
//  of private variables that aren't needed half the time...yuk.
 
//Reading the above comment ages later, I'm not sure why relationships have to be on the
//join clause?  If it's a left one then yes but you can have a left join to a logical table.
//TODO: I've messed up
/// Ok.  I've messed up.  I've messed up good and proper.
/// Aggregate reports wants to format a date as a year, say.  The formatting stuff
/// is buried in compare functions (because ranges was the only place that I needed to do this)
/// There is the column.AddOutputFormatInformation that might be useful when this is fixed????
// This looks like a hack, and, frankly, it is. I'm sorry, I can't work out how to make this not stupid.
// If you're reading this and have some brilliant insight into how to make this work in a way that doesn't
// make people sad, please go nuts and fix it.

Criticising Others

(cg) => { return cg.DisplayName; })); // Wow this is awful
matchingSlotEntity.JobCategoryName = string.Empty; // This doesn't make sense. JobCategory belongs to a person, not a slot.
/// <summary>This is awful.  Get Ryan to support Guids for Mail Merge</summary>
//Please. Remove word interop. Not even Microsoft want us to use it.
TaskCount TaskCount { get; set; } // TODO: Again, this should be readonly. Something mucks with it, though.
MessageBox.Show("Sorry this feature hasn't been implemented yet... :(", "Sad Info..!!");

Funny

// Reverse the order as they are stored arse about face
this.ribbonBars.Reverse();
// Show yourself!
/// <summary>
/// Builds trees for a living.
/// </summary>
internal static class MailMergeTreeBuilder

this.AllowDrop = false;  // No dropping on this tree thankyou.

Even Microsoft throw in the occasional gem.

// At this point, there isn't much we can do.  There's a
// small chance the following line will allow the rest of
// the program to run, but don't get your hopes up.

Useful Warnings

This one warns you that the logic is confusing.

/**********************************************************************************
Description
-----------
You'd think that a stored procedure called 'GetNextBusinessDay' would return the
next business day for an organisation. That would be too easy.
 
What this stored procedure does, is return a day that is not explicitly marked
as 'closed'.
 
It also doesn't return the 'next' business day in the default case where the parameter @BusinessDayOccurrence is set to zero - in that case it returns the current day, or the next non-closed day if the current day has a closure defined on it.
 
@BusinessDayOccurrence doesn't find the first non-closed day after X days from today, incidentally. It returns the Xth non-closed day, which is an important difference. If you have closed bank holidays, and want to know the 2nd non-closed day after 24th December, it's not 27th December but 28th December Confusing!
 
**********************************************************************************/

Logging too many Errors

When things go wrong with your software, it’s obviously good practice for the developer to log relevant information into an error log. You can then know when users are affected, but also how many are affected – to understand how widespread the issue is. With that information, it becomes easier to triage. There can be loads of other projects to work on, and bugs to fix, so being able to prioritise them is key.

Choosing what to log and how often can require some thought. You can come up with categories to your logging such as Information, Warning, and Errors. Errors are when things have gone wrong, Warning could be that you suspect something has gone wrong like missing config, and Information could be useful for debugging like if there is an optional service the user connects to, you can log “user connected“, “user disconnected“.

We have a chat functionality which uses a PubSub (publish/subscribe) model, and we were logging status changes and connection statuses. If you just blindly log scenarios like this, then it might be counterproductive. If the statuses are changing frequently, and there are thousands of users, you can be spamming the error log and then it makes it harder to see the real problems. If you see the same entries logged again and again, it becomes likely that you just think “we expect that“, and then just ignore it.

There can be extra costs associated with logging too. Data takes some memory to store and adding thousands of rows to a database per day can quickly increase the size. All those extra network calls can be excessive too.

We have had a few projects recently with the aim of trying to cut down the amount of errors.

In the case of problems, then obviously fixing the root cause of the problem is the best strategy. If the logs aren’t useful, then it’s best to stop logging them.

If the logs are useful, sometimes it’s best to cut down the logs rather than stop completely. So if you have a log such as “failed to connect” then it retries in a few seconds, do you really want to log another “failed to connect“? Maybe the functionality should try 5 times then give up until the user manually attempts to reconnect. Maybe the logs could remain on the user’s computer then submitted once with the number of failure attempts. So instead of 5 separate entries, it could just submit 1 saying it tried 5 times then gave up.

On a large scale system like ours, the number of entries in the databases are crazy. Read this statement from a concerned Support team member (which I think were the stats 1 month after a recent release):

Based on the daily volume of errors logged over the past few days I’m expecting the number of errors logged in Monitoring to increase by 82% over the course of a month.

  • Current Stats from Server11:
  •  Total Rows in ErrorLog: 11,519,198
  •  PubSub Errors: 3,283,396
  •  Errors excluding PubSub Errors 8,235,802
  •  Avg PubSubErrors/day 218,893
  •  Estimated PubSub Errors 31 Days 6,785,685
  •  Data Size per error (bytes) 3,015
  •  Estimated Data Size PubSub Errors (MB) 21,072

 

Null checking

Introduction – The Basics

A NullReferenceException is an incredibly common mistake and probably the first problem new developers encounter. If you have a reference to an object, but the object is null, you cannot call instance methods on it without it throwing an exception.

So in C#, you could have 

Dog myDog = null;
myDog.Bark();

Which will throw an exception because myDog isn’t initialised.

Dog myDog = new Dog();
myDog.Bark();

This is fine because a dog has been initialised and holds a reference to an object.

If you allow the possibility of nulls, then whenever you want to call a method, you end up checking for null.

if (myDog !=null)
   myDog.Bark();

More concise syntax involves using a question mark which will conditionally execute if it is not null such as:

myDog?.Bark()

So the question mark acts as the IF statement but is a more concise way of expressing it.

New “Nullable Reference Types & Improved Design

A cleaner, and safer design, if you can, is to never allow nulls, then you never have to check for them. In the latest versions of C#, you can make it so the compiler will assume things shouldn’t be null, unless you explicitly specify they can be; see Introducing Nullable Reference Types in C# – .NET Blog (microsoft.com).

Our Problem

I work on some software which is over 10 years old using an older version of .Net. So things can be null, and are often designed for null. We noticed that newer developers (seem to be common for our Indian developers for some reason), seem to put null checks everywhere, regardless if the reference can be null or not. This makes the code much harder to read, and possibly debug, because you are writing misleading code, and adding extra code branches that would never execute.

In a recent code review, the developer added an if statement to check for null

if (user == null)
   return;

So if it got past this code, user cannot be null, yet for every single statement afterwards, he added the question mark to check for null! eg

var preferences = user?.Preferences;

Although it’s also bad design to chain loads of properties together, it is sometimes necessary. Trying to be concise and adding the questionmark to check for nulls can be hard to understand what is actually executed. Combined with LINQs FirstOrDefault, you get even less clarity. FirstOrDefault will return the first item in a list, and if there are no matches; it will return null. So when you see all the conditionals and a method like FirstOrDefault, then when you glance at the statement, it looks very likely it will return null.

var result = value?.Details?.FirstOrDefault() is Observation

So “result” is null if: value is null; details is null; or details contains no items.

“everything is null these days, or not sure what type it is”

Me

A longer example is the following:

if (recordData != null && (_templateDataSource?.EventLinks?.ContainsKey(recordData) ?? false))

Due to the null checks, they have added the “null coalescing operator” (??) and set to false when null. However, none of this data should have been null in their code, so it should be:

if (_templateDataSource.EventLinks.ContainsKey(recordData))

The Lead Developer made a good point that if this data was null, it would be a major bug, but if you put the null checks, then the logic gets skipped and a bug is hidden. It would be preferable to actually crash so you know something is very wrong.

Lead Developer

Should this ever be null? Or the EventLinks on it?
Remember, all these null safety checks could allow the software to continue on invalid data conditions and produce wrong answers instead of crash.
Please re-check ALL the ones you have added in these changes, and if something should never be null remove it.
Unless you are planning to test the behaviour of this when null is passed in and that it is correct.

When so many objects can actually be null, it is easy to miss a check. There was one code review where I questioned the original design which seemed to have excessive null checks (which was implemented by the same author of this new change). His new change was to add a null check that was missed. After making some dramatic changes based on my feedback, he ironically removed his new null check, and thus reintroduced the issue he was attempting to fix!

Developer: "have added a null check to prevent the issue."
Me: "then it will crash because you haven't added null checks"

Another good example of having excessive null checks, but somehow still missing them:

context.NotGiven = resource.NotGiven;
context.Date = Helper.GetDateTimeFromString(resource.Date);
context.IdentifierSystem = resource.Identifier?.FirstOrDefault().System;
context.IdentifierValue = resource.Identifier?.FirstOrDefault().Value;
context.Status = resource.Status;
Coding productCode = resource.productCode?.Coding?.FirstOrDefault(x => x.System == Helper.SystematicInfoUrl);
context.productCode = new FilerCommon.Coding(productCode?.System, productCode?.Code, productCode?.Display);
Coding productProcedureCode = (resource.GetExtension(Helper.ProductProcedureUrl)?.Value as CodeableConcept)?.Coding?.FirstOrDefault(x => x.System == Helper.SystematicInfoUrl);
context.productProcedureCode = new FilerCommon.Coding(productProcedureCode?.System, productProcedureCode?.Code, productProcedureCode?.Display);
Coding site = resource.Site?.Coding?.FirstOrDefault(x => x.System == Helper.SystematicInfoUrl);
context.Site = new FilerCommon.Coding(site?.System, site?.Code, site?.Display);
Coding route = resource.Route?.Coding?.FirstOrDefault(x => x.System == Helper.SystematicInfoUrl);
context.Route = new FilerCommon.Coding(route?.System, route?.Code, route?.Display);
Coding explanation = (context.NotGiven.HasValue && context.NotGiven.Value ? resource.Explanation?.ReasonNotGiven : resource.Explanation?.Reason)?.FirstOrDefault()?.Coding?.FirstOrDefault(x => x.System == Helper.SystematicInfoUrl);
context.Explanation = new FilerCommon.Coding(explanation?.System, explanation?.Code, explanation?.Display);
context.Quantity = new FilerCommon.Quantity(resource.Quantity?.Code, resource.Quantity?.Value, resource.Quantity?.Unit);            
context.LotNumber = resource.LotNumber;
context.Manufacturer = GetManufacturerName(context, resource);
context.OrganisationDetail = GetOrganizationDetails(context, resource);

Some of the FirstOrDefault() then call a property without a null check:

FirstOrDefault().System

Also, because of the null checks when they are setting “site”, it looks like it can be null. So when they set “context.Site”, they have null checks in the constructor for Coding. However that means they are passing null into the constructor which means it probably is an invalid object.

Null flavours

I thought the idea of “Null flavor” is very interesting. NullFlavor – FHIR v4.0.1 (hl7.org) Null means “nothing” but there’s different meanings within; or the reason why it is missing. Some examples are:

  • missing, omitted, incomplete, improper
  • Invalid
  • not been provided by the sender due to security, privacy or other reasons
  • Unknown
  • not applicable (e.g., last menstrual period for a male)
  • not available at this time but it is expected that it will be available later.
  • not available at this time (with no expectation regarding whether it will or will not be available in the future).
  • The content is greater than zero, but too small to be quantified.

Closing Words

Null checks are one of the simplest concepts in object oriented programming, so it’s bizarre how we are seeing many modern programmers struggling to understand when to use them. Even when they find a bug in their code, they still fail to learn their lesson and continue to write bad code with inappropriate null checks. A better design is to avoid the possibility of nulls by always dealing with an object reference (even if the object is one that simply represents “null”).

Performance Tales: SQL performance

Batches

Performance of SQL database queries is an interesting topic that I never understand. When dealing with a large amount of data, we are often told it is beneficial to process the results in batches. So if you are wanting to update/delete rows based on some calculations, for example; we may be told to iterate through groups of 1000 rows.

You can’t just put your query in a batch then assume it is efficient. There was one attempted data fix where the reviewing Database Expert reckoned it would take 2h 46 mins to execute.

“It doesn’t matter if you select 1 row or 1m rows – it still takes 10 seconds. So 1000 batches x 10 sec = 2h 46min processing time just working out what to delete. We need to change the proc so that it gets all 1m rows in one hit at the start into a temp table (should take 10 seconds) and then use that list to work from when batch deleting the rows”

Database Expert

Performance Tales: LINQ to SQL
  

Although we mainly use SQL Stored Procedures to retrieve data from the database, some of our code uses LinqToSQL which uses code to dynamically generate SQL. Sometimes it seems that handcrafting your own database query (Stored Procedure) can give better and consistent results since the SQL is the same every time. Sometimes the performance issues isn’t with the actual query, but possibly that the columns don’t have an appropriate Index applied to them to allow fast lookups on that column.

There was a problem where our default 30 second timeout was been reached for some users. So if they attempted to load a large record, it would take 30 seconds then error out. One developer suggested to increase from 30 seconds to 5 minutes.

“i’d like the customers to still be able to see the user’s record.”

Developer

The record would eventually load for the user (presumably), if the user has actually waited for 5 minutes and not closed the application, thinking it has crashed (often the user interface in this example would just show “Not Responding“).

This idea doesn’t fix the root problem though, and the software still seems unusable for the user’s point of view.

An Architect stated:

“there is something very wrong with the query if it’s taking longer than 30 seconds, particularly when the use case of using LINQ2SQL is commonly for small batches of data.

Software Architect

The default is usually a good indicator that something is wrong, but if we increase the timeout across the board we will miss out on such problems.

The reason the timeout is being breached needs to be investigated:

  • Is it general: applying to any query seemingly at random?
  • Is it during the connection to SQL Server or during the execution of a query?
  • Could a particular problem query be optimised, be made asynchronous, or the timeout be altered for that individual query?

The Database Expert stated:

Nothing by default should take over 30 seconds – it’s a good default. If we have particular problems with SQL taking over 30s it should be investigated and addressed. It is possible that some features should be allowed to take over 30s (e.g. not user facing, known to take a long time). Allowing >30s means more chance for blocking and wider impacts.

Having queries run longer than 30s increases the amount of time they are running in SQL – this could lead to more blocking, more CPU/memory demand which could make the entire server unreliable, so we go from 1 user complaining to many.

LINQtoSQL can be optimised, we’ve done it many times over the years. The simplest is to replace the queries with Stored procedures – but it is possible to improve bad LINQtoSQL too. It depends what is wrong. None of daily operations should be anywhere near that. 

SQL can do near 0ms operations on many more rows of data than we have. It isn’t a problem because we have more data, it will be something with that query. Poor plan choice or Blocking.

Database Expert

Performance Tales: Limit User Input

A developer looked into an issue related to our Searches module which allows the user to query their data and generate tables and charts to create their own reports. The Developer claimed their new changes gave performance “up to 10 times better

The Database Expert looked at his changes and stated:

“I think that this might not be the right way to approach this issue in isolation. The issue with this procedure is not that it is slow in general, it is that our software allows you to call it in a huge variety of ways, some of which can only produce terrible execution plans as it stands. We should be analysing what kinds of searches that users do, and see if we can restrict some of the sillier searches that they can do. For example, users can right now do an audit search filtered on “anything before today’s date” with no further filters.

For example, see this snippet of a live trace done right now: <shows screenshot>

Clearly, some calls are very well optimised and others are terrible.

We should be looking at restricting what users can do, for example requiring people to search on a more focussed timeframe.

The way I would approach this issue would be to look at trace data and look for two main things:

-The styles of queries that are run most often

-The styles of queries that run the worst

Both need focus. The ones that run most often, even if they are relatively quick, can yield a big improvement simply from scale alone. The ones that run the worst improve the overall user experience as well as the impact on the system.

Of course improving a query might not involve any SQL changes at all, instead they might involve app changes to prevent silly queries from being run.”

Database Expert

Keep lowering the number with each MI

There was a Major Incident regarding a feature. The stored procedure used already had a limit on the amount of data selected in one go. A developer changed the number from 50 to 10. Since it is a very small number, I couldn’t understand why such small numbers made a difference. There was a code comment next to this limit to say it was there to “improve performance”. I looked at the file history to see if I could find any more information about why this line was added in the first place. I saw a very interesting trend, where that line was the only line to be changed in the previous 2 revisions:

original: select top 250

24 Sept 2015: select top 100

2 Oct 2015: select top 50 

25 May 2022: select top 10 

The developer did explain that the data is passed to another process that takes 90 seconds to process 10 instances. It’s essentially a queue that is polled every 10 minutes and would only have a small number of tasks each time.

The concerning thing is that the number keeps being lowered and the performance is not deemed good enough. Maybe the overall implementation needs revising.

“Temporary” Check-ins

In Software Development, sometimes you have to be pragmatic and choose a sub-optimal solution in order to get some functionality/bug fix out the door urgently. In this case, you sometimes log a work item with a better idea to implement. However, because there’s a lack of “business value”, other work gets prioritised and this work never gets done.

XKCD

XKCD joked about how this seems to be a common phenomenon. You think the feature is temporary or low use, and it becomes a thing that the product relies on. Sometimes the opposite scenario happens where you expect your code to be reused/enhanced/adapted so come up with a great design, using some appropriate “design patterns” and ensure it is well tested and documented. However, it then doesn’t get used like you thought.

https://xkcd.com/2730/

Dave Plummer

Dave Plummer, a Microsoft developer, wrote this story about how his Disk Format dialog was meant to just be temporary. After all these years, it still remains in use.

I wrote this Format dialog back on a rainy Thursday morning at Microsoft in late 1994, I think it was.

We were porting the bajillion lines of code from the Windows95 user interface over to NT, and Format was just one of those areas where WindowsNT was different enough from Windows95 that we had to come up with some custom UI.

I got out a piece of paper and wrote down all the options and choices you could make with respect to formatting a disk, like filesystem, label, cluster size, compression, encryption, and so on.

Then I busted out VC++2.0 and used the Resource Editor to lay out a simple vertical stack of all the choices you had to make, in the approximate order you had to make. It wasn’t elegant, but it would do until the elegant UI arrived.

That was some 30 years ago, and the dialog is still my temporary one from that Thursday morning, so be careful about checking in “temporary” solutions!

I also had to decide how much “cluster slack” would be too much, and that wound up constraining the format size of a FAT volume to 32GB. That limit was also an arbitrary choice that morning, and one that has stuck with us as a permanent side effect.

So remember… there are no “temporary” checkins 🙂

Bing

I gave the following prompt to Bing to write a blog post

“ write about how in Software Development, you may come up with a temporary solution, but it ends up being used for years.”

The Permanence of "Temporary" Solutions in Software Development

In the fast-paced world of software development, the term "quick fix" is often synonymous with a temporary solution intended to address an immediate issue. These solutions are typically devised under the pressure of looming deadlines or critical bugs that need swift attention. However, despite their intended short-term application, these "temporary" solutions often become permanent fixtures in the codebase, lingering for years beyond their expected lifespan.

The phenomenon is so common it has its own adage: "There is nothing more permanent than a temporary solution." But why does this happen?

Firstly, there's the factor of immediate relief. A temporary fix that resolves a pressing problem can reduce the urgency to replace it with a more robust solution. Once the immediate pain is alleviated, the incentive to revisit the issue diminishes, especially when other, more pressing issues arise.

Secondly, there's the issue of risk versus reward. Modifying a working system carries inherent risk. The fear of introducing new bugs or the potential downtime that might come with refactoring code can deter developers from replacing a temporary fix that works, albeit imperfectly.

Moreover, as time passes, the context in which the temporary solution was implemented may be lost. Documentation may be lacking, or team members may move on, taking their knowledge with them. This loss of context can make the thought of unraveling a temporary solution even more daunting.

Another contributing factor is the nature of business priorities. Often, the push for new features and visible progress trumps the less glamorous task of maintaining and improving existing code. This can lead to a culture where temporary fixes are not just commonplace, but expected and accepted as the norm.

However, the reliance on temporary solutions can lead to technical debt, a metaphorical term that reflects the implied cost of additional rework caused by choosing an easy solution now instead of using a better approach that would take longer. Like financial debt, technical debt accumulates interest, manifested in the form of compound complexities that make future changes more costly and time-consuming.

To mitigate the permanence of temporary solutions, development teams can adopt several strategies:

- Prioritize technical debt: Treat it as an important factor in the backlog, not just an afterthought.

- Improve documentation: Ensure that the reasons for temporary solutions and the intended long-term strategy are well-documented.

- Foster a culture of quality: Encourage practices that balance the need for progress with the importance of code quality and maintainability.

- Regular refactoring: Schedule time for refactoring and improving the codebase to prevent temporary solutions from becoming entrenched.

In conclusion, while temporary solutions are sometimes necessary in software development, awareness of their tendency to become permanent can help teams take proactive steps to manage them effectively. By doing so, they can maintain a healthier codebase, reduce technical debt, and ultimately deliver better software.

SonarCloud, the Static Analysis Tool

SonarCloud is a static analysis tool. It runs on your codebase and points out where code can be written to “best practices”, possible bugs, and code coverage.

When my employer bought a SonarCloud licence, there was a large emphasis that they wanted existing errors fixed, and no errors in new code. I knew this would be a bad idea to emphasise because of the classic “you get what you measure”. If you tell people the goal is to not have errors, then they will do whatever they can to not have errors. The idea of a tool like SonarCloud is to improve code quality, but since the metric is a number, then the number becomes the goal rather than code quality. However, the aim should be to improve the quality of the code, optimised for readability, maintainability, scalability, and code coverage.

As more people began working with SonarCloud, we saw more changes with the title along the lines of “Fix Sonarcloud issues“.

Human Analysis

SonarCloud is just a static analysis tool, it can only spot problems and suggest pre-defined improvements. These improvements are not always in the best interest of the code base. You have to objectively ask yourself “Does the change I’m making make the code better or worse?” Before you can do that, you need to understand why Sonar thinks there is a problem with the code. Then you can decide if Sonar is right or not. Some of the rules are not applicable everywhere.

I have seen a number of changes where the code is actually made worse by the changes to satisfy Sonar.

Example 1: 7 Argument Limit

A constructor or method that contains loads of parameters/arguments can be a sign of bad design, and maybe some of the parameters can be grouped together inside an object. Once you reach 8 arguments, Sonar will flag this. A simple fix is just to create a class and throw in a couple of parameters in there. It then satisfies the rule but it doesn’t make logical sense unless the parameters are in fact related. Adding a new class when it is single use could just make the codebase more cluttered and seemingly more complicated. I would rather see a class with 8 values, than some complicated class with extra types in the way “Just because of Sonar”. Mark Seemann has a good blog post about Curbing code rot with thresholds.

Example 2: TryParse

Another example, I once wrote some code with the following:

var ok = int.TryParse(a, out var x) & int.TryParse(b, out var y);
//something with x and y

Sonar complained about the use of the bitwise & and it was confusing and suggested I use &&. However, if I did that then “y” wouldn’t always be defined because of the short-circuiting and the code wouldn’t compile. I was about to reject the Sonar issue as “Suggests code that doesn’t compile” and just keep my version.

Then I thought, “if Sonar cannot understand my code to make a decent suggestion, maybe other developers can’t either“. I was trying to be too clever with my code.

Instead I changed the code to:

var ok1 = int.TryParse(a, out var x);
var ok2 = int.TryParse(b, out var y);
var ok = ok1 && ok2;
//something with x and y

It wasn’t as terse as my original version, but it was certainly easier to read and understand, and Sonar didn’t have a problem with it any more either.

Example 3: Cyclomatic Complexity

When a method has loads of If statements, this creates loads of permutations that can be executed which means if you are aiming for true 100% test coverage, you have loads of tests to write. It can easily make the code hard to read and understand too. At a certain point, Sonar suggests breaking the methods into smaller methods. I have seen people take this extremely literally and you end up with a design that looks like

  SetPersonalDetailDifferences1(returnList);
  SetPersonalDetailDifferences2(region, returnList);


So there is no logical grouping for what goes in part 1 and part 2, it’s just that enough code gets placed in the first method then everything else in the second. Now the original single method is half the size with no logical reason other than to satisfy the Sonar rule.

Me (rhetorically)
Are these fields logically grouped or is this just to satisfy sonar?

Brijesh
It is just to satisfy sonar

Example 4: Making The Code Worse

Nullable DateTime

DateTime always has to have a value. However, You can declare a nullable DateTime in C# by appending a question mark like DateTime?

The existing code was checking a standard DateTime for null, which can never happen.

if (startDate == null)
   throw new ArgumentNullException("startDate");

The Code Analysis report was correctly flagging this code as completely unnecessary. Instead of removing the code, the developer then changed it to

if ((DateTime?)startDate == null)
    throw new ArgumentNullException("startDate");

The method was still accepting the startDate as a non-nullable DateTime so could never be null. But then it was being cast to a nullable DateTime so the check against null is technically valid.

Me:
Why are you casting to nullable datetime?

Chandeshwar
dateTime is value type , it can never be null value.
if check dateTime == null, it's always return false .

Me:
Yes, that is correct Chandeshwar. That’s why you can delete the code completely. Your code is always false too.

Sometimes, this type of scenario leads to multiple attempts. Either

  1. because they make some changes and they still get the same Sonar error,
  2. they introduce a new problem,
  3. or maybe the reviewer dislikes the changes and wants them to change it back.

So there’s plenty of files where our Change History looks like this

Fix Code Smell Issue <- doing what they think Sonar wanted them to do
Additional Static Analysis Fixes <-they messed up, so tried again
Addressing code review comments <- I pointed out it still wasn’t correct

Example: Magic Numbers

Magic numbers should not be used

A magic number is a number that comes out of nowhere, and is directly used in a statement. Magic numbers are often used, for instance to limit the number of iterations of a loop, to test the value of a property, etc.

Using magic numbers may seem obvious and straightforward when you’re writing a piece of code, but they are much less obvious and straightforward at debugging time

That is why magic numbers must be demystified by first being assigned to clearly named variables before being used.

-1, 0 and 1 are not considered magic numbers.

Vignesh changed code like

dosagesCount == 1

to

dosagesCount == Constants.Single

Me:
Constants are for a different purpose! They are not for replacing all the numbers in the codebase. It's a good idea to give numbers clear names, and if the same value is used multiple times, it means if you need to change the number, it will update in all places.

Vignesh
This is the rule we follow (quotes the magic number description shown above)... and I got comments from my senior level.

WHAT DOES IT SAY, VIGNESH?
“-1, 0 and 1 are not considered magic numbers”

And you have replaced 1 with the word “SINGLE”

Example: Lack of pragmatism

It frustrates me so much that so many developers even don’t agree with Sonar, or don’t understand the change, but still attempt to change it anyway. Their sole goal is to remove the error that they lose sight of the true aim to write great code that works.

Dave
OK, but why did it suggest it, and why does it make the code better? Did you understand the suggestion or just blindly do what it said?

Pavel
I think I understood it. The comment was: "Add the default parameter value defined in the overridden method". Default arguments are determined by the static type of the object. If a default argument is different for a parameter in an overriding method, the value used in the call will be different when calls are made via the base or derived object, which may be contrary to developer expectations.
That's why it suggested it. But in my opinion this change is redundant and doesn't make the code better.

There was some code which had loads of if statements. It was checking the types of “nodes” in a tree structure, and so attempted to cast the type. If successful, it would go into that code block, else it would attempt to cast to a different type.

Even though the developer didn’t need to change this code, he did change it to attempt to resolve Sonar issues with regards to the casting. However, he only updated some lines and not others which meant it was inconsistent so the codebase is more confusing, and still contains Sonar errors. He also took some lines out of the if statements and then performed all the casting at the top of the method. So now it was actually more inefficient because once you have found a match, you do not need to keep attempting to cast.

  {
            var creterionTag = row.Tag as LinkedCriterion;
            var relationshipTag = row.Tag as Relationship;
            var attributeTag = row.Tag as Attribute;
            var resultSetRuleTag = row.Tag as ResultSetRule;
            var conceptOrderingTag = row.Tag as ConceptOrdering;
Me
why change to "as casts" for most, but then not for Table, Concept and SharingDisplayValue?
I'm not even sure if there is an advantage to doing it this way. We now have several variables where only 1 will be set, and the rest are null.
Might be better spending more time refactoring it out to get rid of the branching.
Pattern matching is probably the neatest way for now. https://docs.microsoft.com/en-us/dotnet/csharp/pattern-matching

Kalya
I just simply resolved some existing SonarQube issue which not raised because of our changes. It is a kind of help to resolving existing issues, It is very difficult to resolving all the issues as of now

So he tried to help, was too difficult, so gave up, but still decided to submit the changes for review, despite making the code worse.

Example: Just Suppress it!

Of course, you can still get rid of the error without actually fixing anything. But merely hide it from the managers that are only looking at the figures:

#pragma warning disable S1075 // URIs should not be hardcoded
public const string InfoUrl = " http://med.info";
 #pragma warning restore S1075 // URIs should not be hardcoded

Conclusion

The goal isn’t to make Sonar happy, the goal is to write good clean code, sonar is a guide to help you do that, but it doesn’t guarantee success.

Me

To Dos

When writing code, some developers like to write notes to themselves in the actual code using “comments”. A special type of comment is the “To do ” which simply starts with the prefix “TODO”. Once the feature is complete, you should have no TODOs left in the code.

In a project, sometimes you may split up the requirements in a way that you may need to leave the TODOs in for now to indicate this will be covered in the upcoming weeks, and maybe another team member will write the code there; removing the TODO when complete.

When a feature is released, you should definitely have no TODOs there. For some reason, this wasn’t the case a few decades ago. There’s still loads of TODOs in our legacy codebase. It seems a common occurrence though when you read what other codebases are like.

/// <summary> 
/// TODO: Whoever wrote this please document 
/// </summary> 

What use is a “TODO: Whoever wrote this” style comment? 

Surely it’s likely that the original developer won’t update it, and surely it isn’t hard to find out who wrote it if you just check source control. Now the TODO will stay around until someone is bothered enough to delete it, and they probably won’t “document”.

“I’ll do it later” is one of these lies you tell yourself. Have you ever seen a kanban card saying “review TODOs”? Nope. It never gets fixed.

Vincent Deniel

Cory House

Cory House’s advice is the following:

// TODO: Thing we’ll never get around to doing. If it’s worth doing, do it now or open a ticket. Why a ticket?

  1. TODO comments are often forgotten.
  2. A ticket makes the issue more visible.
  3. Tickets can be tagged, categorised, sized, and prioritised.
  4. TODO comments create a second system for tracking work. Now we have no easy way to see our backlog in one spot.

Sonarcloud 

The static analysis tool Sonarcloud flags up TODO comments because it obviously means something is missing and needs to be addressed. However, some developers have the mindset to remove Sonar errors by doing the minimum to make the error go away. So to fix a TODO, it’s easier to remove the comment rather than fixing the actual issue (if there even is an issue to fix). 

However one developer just removed the word TODO so then Sonar would no longer flag it, but the comment still remains, and nothing is actually fixed.

Me
why delete part of the comment? delete it all if it doesn't add value

Kalya
Yeah, SonarQube identify only TODO text to remove, so that did like. Now I just removed entire comment

Me
Sonar doesn't tell you to remove the TODO text:
"TODO tags are commonly used to mark places where some more code is required, but which the developer wants to implement later.
Sometimes the developer will not have the time or will simply forget to get back to that tag.
This rule is meant to track those tags and to ensure that they do not go unnoticed."

Kalya
Yeah you are correct, But we should not move any incomplete code to master with TODO identification right. Anyway it is there in master, so in this case what shall we do whether can we remove or keep as it is. Kindly suggest. This is also opt for all the incomplete methods as well.

Me
Yeah, new TODOs mean you haven't finished your project.
It's not a big deal to leave the existing TODOs as they are.
If you did want to address them, for each existing TODO, you would really need to think if it is a problem or not. They have probably been in the codebase for years, so most likely are safe to remove the TODOs entirely. If the TODO is in a blank method, then you can just remove the method if you wish. Sometimes the methods may be part of an interface though so would need to remain. But then Sonar will complain if you don't have some kind of comment in it, so maybe just leave those with the TODO comment in, unless you can think of a good code comment to replace it with.
It is possible that some of the TODOs could point out possible bugs or limitations so could be helpful as a standard comment. In that case, you may need to reword them a bit so they make sense .
In general, for any kind of refactoring, you need to consider if it is worth changing. The more changes you make, the more risk: increase testing scope, and more code to review (I think most of my comments have been for Sonar changes). Deleting the TODO comments has no impact on the behaviour.
I tend to make changes just to the methods I am modifying due to my bug fix or enhancement. If the testers are planning on doing a full regression test, then I'll consider making more dramatic changes.

This is very concerning to hear

On a code review, a Senior Developer, Lee questioned why there was no database changes when the Developer Neil had removed all the related C# server code. Neil replied that he “wasn’t sure how the patching process worked” (despite being here years, and was in a team with experienced developers), and wasn’t sure if there were any backwards compatibility issues to consider.

So what was his plan? just hope it gets past the code review stage unchallenged? Then we would have some obsolete stored procedures, and unused data lingering in the database for years?

I initially thought his claim for backwards compatibility issues was nonsensical but from an architectural standpoint, it makes sense due to how it works in our system. The server code doesn’t call the other’s server; it goes direct. So that means if the old version calls the new version, then it would expect the stored procedures and data to exist. However, for this particular feature there were no cross-database calls at all.

I suppose being cautious and not deleting the data makes sense from a rollback point of view. It’s hard to restore the data if it is lost, but easy to restore the C# code. I have never seen us use this approach though.

The Senior Developer said :

This is very concerning to hear, can you please work with your team lead to understand how our versions are deployed, and if they are unable to answer all the questions, please reach out to someone. We do not support any version changes by default, though there are cases where we do have cross version server/database calls, but these are for specific cross organisation activities.
You can safely remove these columns, update these stored procedures.
There is no value in leaving something half in the system, if it is no longer needed, remove all references, database rows/columns/tables, class Properties, etc.

In my previous blog, I discussed Project vs Domain Teams. This is kinda linked in the sense that specialising in a certain area of the system means you gain knowledge of the functionality and architecture of that area. There would be less chance of this scenario happening where the developer is questioning if there could be backwards compatibility issues. However, he could have also found this information out by raising questions.

This example does cover many topics I have discussed on this blog:

  • Poor communication
  • Bad decisions
  • Funny quote from a senior developer ”This is very concerning to hear”

Domain Teams, Project Teams & Cross-Cutting

In the world of Software Development, there are often differing views on how to arrange teams. Regardless of the approach, people will leave/join over time, but team members need to be replaced and teams need to adapt.

There was a time when we were arranged into teams that were assigned to a Project, then moved onto a completely different one once complete. Any bugs introduced by the projects then get assigned to a “Service Improvement” team who only deal with bugs (and possibly ad-hoc user requests).

Then after a few years, and maybe under a new Development manager, they would restructure to Domain teams where you take ownership of a group of features and only projects related to those would be assigned to your team. Any bugs introduced by the projects stay with the team, which gives you greater incentive to fix them early as possible. People build up knowledge of their areas and become experts.

Then a few years later, we will switch back to Project teams.

There’s pros and cons to each structure, and there’s always edge cases which pose a management problem. Even in a Domain Team, there will be certain features that don’t neatly fit into the groups you defined, or ones that apply to many modules eg Printing.

Sometimes we have called a team that handles the miscellaneous features “Cross-Cutting”. Managers would sell it on being for features like Printing that really are used by many areas of the system, but we all know it becomes a team that gets miscellaneous and unrelated projects. They end up being like the “Service Improvement” team that deals with random bugs, and work no one else wants to do.

Cross-Cutting

There was a meeting where managers were announcing the new Domain Teams and I got assigned to Cross-Cutting. One developer was voicing his concerns about having a Cross-Cutting team. He wanted to point out that Domain Teams are supposed to have specialist knowledge on the Domains but most people that were assigned to their teams had little-to-no knowledge. For some reason he chose my name to make a point.

“What does TimeInInts know about Cross-Cutting?”

Which received a room full of laughter. I’m sure some were laughing at his point, some laughed at his emphasis and delivery, and others probably saw it as an attack on my knowledge. I was probably one of the best people for it really, given my experience in the previous Service Improvement teams.

The whole idea of keeping Domain knowledge in the team only works if there is a true commitment to keep the teams stable over years. However, people will leave the business, some will want to move to a different project to broaden their skills, or people could just fall out with their team members.

Another concern this developer had was with his own team. He was assigned to a Domain team he was the expert on, but was used to working with a couple of developers in the UK. This new team had two Indian developers. They had recently acknowledged the distributed teams weren’t really working so these new Domain teams were supposed to be co-located. But this setup seemed to signal that he was there merely to train these Indians up to then essentially offshore the Domain. Since he was the expert and proud of it, he still wanted to work in that area. But he can’t work on the same software forever.

In the Cross-Cutting team, we had an open slot labelled “new starter” so we were going to get a new hire in. You have to start somewhere, but again, this doesn’t help the teams specialise if they don’t already start with the knowledge.

Colleagues Opinions:

Developer 1:

Me 13:39: what does a new starter know about Cross-Cutting? 
Mark 13:39: sounds more like Cost Cutting! 

Developer 2:

It’s infinitely harder to build something if you don’t understand the thing you’re building. Hard to catch issues and make sense of designs if you had no opportunity to learn the domain.

Developer 3:

isn’t one of our major issues is we’ve lost domain expertise for core/bread and butter modules.  For any “module”, there’s a combination of what the requirements are/how it should work, and what the code is actually doing. Without “domain teams”/ownership – we’ve lost a large part of the puzzle (how module should work).

Developer 4:

our teams are completely ineffective, expertise has been spread too thin. We probably need to reorganise the department again with who is remaining.

Build stronger teams first that only have one junior-ish person, then have weaker teams helping out where possible. It will be very hard for the weaker teams, but unless we do this, we’ll lose the stronger people.

The weaker teams should be given appropriate projects with longer timescales, and given as much help as possible while ultimately having to struggle their own way, stronger people who put in the effort will begin to emerge from those teams.