Performance Tales:  Out of Memory Fixes

We have an area of our system that is a major pain for memory usage. We allow users to create what we can generically refer to as “Resources” and new users will then download the entire set of them, which are then cached locally.

The initial download is very large, then they are loaded into memory the next time the application loads. Most of the time, it is on-demand, but it can be slow and be very memory consuming.

Another problem is that due to various bugs, sometimes these resources can be missing/corrupted and have to be downloaded again.

Over time, the area of code has been cobbled together by developers that don’t really understand the system and so has perpetuated the inefficiency and bugs, which then becomes an endless cycle of making the system worse.

There was a big push to improve this area of the system, but no one has learned their lesson, so many juniors got assigned to fix the problem.

When it comes to code reviews, code can be surprising, and the surprise either comes from the fact that the developer is significantly smarter than me, or maybe significantly dumber. So sometimes I find myself looking at it and wonder if it really is bonkers, or some genius understanding of code that I need to learn. So sometimes it’s best to ask your colleagues to check your understanding.

I don’t remember seeing a cast in a property before:

public IEnumerable<MissingResource> MissingResources
{
    get { return _missingResources; }
    private set { _missingResources = (List<MissingResource>)value; }
}

So it’s either incredibly smart, or incredibly dumb.

“That cast is mental!

You could set it to anything that implements IEnumerable<MissingResource> – but it better be a List<>

Dave (translation of what the code is saying)

Is the following just a lack of trust that .Net won’t clear the old objects? To me, this code makes it seem like there is a bug which they are working around, or are just going wild nulling everything out to save memory.

public void ClearData()
{
	NewResources = null;
	ExistingResources = null;
	MissingResources = null;
	SkippedResources = null;
	NewResources = new List<IResource>();
	ExistingResources = new List<IResource>();
	MissingResources = new List<MissingResource>();
	SkippedResources = new List<IResource>();
	IndexedResources = new List<Guid>();
}

trust understanding

Dave

Does that finally block do anything? it’s a local variable so should be marked for garbage collector at that point anyway

finally
{
	bulkResources = BulkResource.Empty();
}

Yes it does something.

That something is worse than doing nothing!!!!

the finally allocates another instance and loses scope of the current one, meaning there are 2 things to GC now 

Dave

I do wonder if sometimes they don’t really know what you are asking but just change stuff anyway. So after I point out their use of null didn’t do anything good, we now create some empty lists and clear them if they are not null. (which they aren’t null, and are definitely empty because we just created them).

public virtual bool PrepareItemsForImport(ImportProcessParameters parameters)
{
	DialogService.SetProgressFormText("Preparing to import...");
	_newResources = new List<IResource>();
	_existingResources = new List<IResource>();
	_missingResources = new List<MissingResource>();
	_skippedResources = new List<IResource>();
	_indexedResources = new List<Guid>();
	ClearData();
	_importStartDateTime = DateTime.Now;
	_mappingInformation = RemappingService.MappingIdentifiersForOrganisation;
	return true;
}

public void ClearData()
{
	NewResources?.Clear();
	ExistingResources?.Clear();
	MissingResources?.Clear();
	SkippedResources?.Clear();
	IndexedResources?.Clear();
}

Does “ClearDataInViewModel” do anything? You call it right before the view model goes out of scope and is eligible for garbage collection anyway?

Me
using (var dialogService = new DialogService())
{
    var viewModel = new ImportDetailsDialogViewModel(dialogService);
    viewModel.InitializeFromImportProvider(importProvider);
    var dialog = new ImportDetailsDialog();
    dialog.DataContext = viewModel;
    Application.ShowModal(dialog);
    viewModel.ClearDataInViewModel();
}

Remember what the point of this work was. It was to reduce memory leaks, and also improve performance in other ways (fixing bugs in the caching, reduce server calls, remove redundant code). What they have done so far is to add more redundant code and show a complete lack of understanding how/when the garbage collector in C# works and runs. The garbage collector is the way that memory (RAM) is freed up.

public IEnumerable<TemplateHeader> GetMobileTemplateHeaders()
{
	List<TemplateHeader> headers = Retrieval.GetMobileTemplateHeaders().ToList();

	return headers;
}

The above code was changed to this:

public IEnumerable<TemplateHeader> GetMobileTemplateHeaders()
{
	IEnumerable<UserTemplateDefinition> mobileUserTemplateDefinitions =
		Retrieval.GetMobileTemplateHeaders();

	IEnumerable<TemplateHeader> mobileTemplateHeaders =
		mobileUserTemplateDefinitions
		.Select(
			template =>
			new TemplateHeader(
				id: template.Identifier,
				title: template.Name));

	return mobileTemplateHeaders;
}
Me
Retrieval.GetMobileTemplateHeaders doesn't seem to return TemplateHeaders anymore

Jaz
Fixed this

Me
You are still taking the output from a method called GetMobileTemplateHeaders and converting them to TemplateHeaders. Seems like the method should be renamed, or the return type changed

Jaz
It is returning template headers enabled for mobile. So it was named as GetMobileTemplateHeaders.

Me
This was the code before. It's of type TemplateHeaders
List<TemplateHeader> headers = Retrieval.GetMobileTemplateHeaders().ToList();

This is the code now
IEnumerable<UserTemplateDefinition> mobileUserTemplateDefinitions = Retrieval.GetMobileTemplateHeaders();
It isn't of type TemplateHeaders
but you want TemplateHeaders. So you then take the output of Retrieval.GetMobileTemplateHeaders and convert it to TemplateHeaders, storing it in a variable called mobileTemplateHeaders.

The code looks strange to have a call to GetMobileTemplateHeaders then the line straight after it creates a variable called mobileTemplateHeaders.

Surely we expect the code to be more like IEnumerable<TemplateHeader> mobileTemplateHeaders = Retrieval.GetMobileTemplateHeaders();?

Jaz
Change done.

Another developer pointed out they had introduced another inefficiency by grabbing ALL resources and not just the ones they were interested in. So they aimed to cut down memory usage but actually managed to increase it!


Gary
Are you sure you want to do a get bulk resources to only just get the templates out?

You are getting all types of resources ~20k+ items etc to only throw the majority of that data away?

Jaz
Checked with the team and changed the approach to get templates only

Conclusion

It is very easy to understand why this particular area of the system is a massive problem area. If you tell the developers to look into improving performance, they just end up changing random bits of code and hope it somehow works. Then when it is a half-decent change, they won’t put much thought into the naming, so then it’s hard and confusing to read.

What we need to do is actually assign some smarter developers to the project; ones that understand how memory leaks can occur, look at the number of resources being loaded at certain points, and analyse the SQL queries to do the initial retrieval.

Errors logging errors

Today, I saw some code that looked like the classic “I can’t recreate this bug, so I’ll write some code to log extra information, and I’ll hope this somehow helps me diagnose the issue”.

They had a list of data, and the ID in this list should be unique, but it was in there twice due to a bug. So it would look like something like this (ID =1 is in there twice):

private List<Code> _codes = new List<Code>()
			{
				new Code
				{
					IsSelectable = true,
					ID = 1
				},
				new Code
				{
					IsSelectable = true,
					ID = 1
				},
				new Code
				{
					IsSelectable = false,
					ID = 2
				},
				new Code
				{
					IsSelectable = true,
					ID = 3
				}
			}; 

Then their method contained this code to validate the data:

		private void EnsureValid(int idToFind)
		{
			try
			{
				_codes.Single(s => s.ID == idToFind);
			}
			catch (InvalidOperationException)
			{
				Dictionary<string, object> customData = new Dictionary<string, object>
					{
						{ "ID", _codes.ToDictionary(x=>x.ID) }
					};

                LogToErrorDB(customData);
			}
		}

The Single will throw an exception due to there being more than one matching element (Single will succeed when there is a Single match). In the catch block, they then convert the code list to a dictionary and will log this in the error database. However, a dictionary requires the keys in the dictionary to be unique. Since there’s duplicate IDs, this will throw an exception again, this time with:

“System.ArgumentException: ‘An item with the same key has already been added.'”

Error thrown in the catch block

So you get an error whilst trying to log your error. The extra information they intended to log is never logged.

I’ve seen this mistake happen a few times before, and usually you can test it by artificially creating the data condition. Then they would realise their error logging doesn’t work.

Computers cannot handle floating point maths

I’ve been writing a game using Unity, and I’ve just realised how rare it is to use floating point (decimal) numbers in my day job as a software developer. I remember my Computing teacher back in College stating how computers are rubbish at calculating decimal numbers, but I never really understood. Calculators work just fine. They may only have several decimal places but that is usually more decimal places than you need.

In game development, decimal numbers are frequently used for sizes and positions in a 3D space and therefore calculations with these objects involve lots of decimal maths.

After writing some logic to position my characters on the map, I tried writing a unit test. Unit testing this method could be beneficial when I add extra features later.

I wanted to compare a Vector which is composed of 3 floating point numbers. Since I already had written the logic and was happy with the character’s positioning, I just grabbed the output of the method and specified it as the expected output, but my test failed!

Expected: (-34.28, 0.00, 20.63)
But was:  (-34.28, 0.00, 20.63)

so then I inspected the Vectors more closely

"(-34.28, 0.00, 20.63)"
	magnitude: 40.00208
	normalized: "(-0.86, 0.00, 0.52)"
	sqrMagnitude: 1600.166
	x: -34.275
	y: 0
	z: 20.625
 
new Vector3(-34.28f, 0.00f, 20.63f)
"(-34.28, 0.00, 20.63)"
	magnitude: 40.00894
	normalized: "(-0.86, 0.00, 0.52)"
	sqrMagnitude: 1600.715
	x: -34.28
	y: 0
	z: 20.63

I think the numbers it shows you is a text representation which involves rounding the numbers to 2 decimal places. But if you look at the actual X, one is -34.275 and the other is -34.28. So there’s a 0.005 difference. A similar problem with Z.

What also threw me, is when I looked at the documentation, https://docs.unity3d.com/ScriptReference/Vector3.Equals.html  I saw this:

“Due to floating point inaccuracies, this might return false for vectors which are essentially (but not exactly) equal. Use the == operator to test two vectors for approximate equality.”

I was originally using Assert.Equals(expected, actual) so I switched to Assert.True(expected == actual) but it still wasn’t working.

I looked at the Unity source code and they are providing their own == operator for vectors that does some “square magnitude” calculation then compares if it is less than a certain number – a margin of error. 

this is Unity’s version

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool operator ==(Vector3 lhs, Vector3 rhs)
{
   float num = lhs.x - rhs.x;
   float num2 = lhs.y - rhs.y;
   float num3 = lhs.z - rhs.z;
   float num4 = num * num + num2 * num2 + num3 * num3;
   return num4 < 9.99999944E-11f;
}

So I did something similar and increased the number. I came up with this, using Unity’s provided square magnitude function, but wasn’t sure if it made mathematical sense: It worked for my scenarios though

private static void AssertAreApproximatelyEqualDueToFloatingPointImprecision(Vector3 expected, Vector3 actual)
{
   var difference = Vector3.SqrMagnitude(actual - expected);
   Assert.IsTrue(difference <= 5.0E-04);

/*
eg
0.0004996415
<=
0.00050
*/
}

Looking back at this, I probably over complicated it. I suppose I should just change the expected values to have more decimal places because it’s the rounding to 2 decimal places that will be the actual problem in this case. In other cases with more decimal points, you will encounter issues due to the way computers do handle large decimal numbers which the Unity docs are alluding to.

In other situations, you also need some leeway in your calculations. If you want to move a character to a certain position, you may find that they could be 0.1 away from their destination, then on the next frame, they move their usual distance based on their movement speed – and then overshoot it. So you end up needing to write code like “if distanceRemaining < 0.2f then stop

So floating point numbers, especially when it comes to game development, really does add extra complications to think about.

Bug: Identification number mismatch

Recently, a developer had fixed a bug with title similar to “Tax Number is not populating in Save Registration dialog”

The original code looked like this:

public CarIdentification CarIdentification1
{
   get { return _carIdentification1; }
}
 
public CarIdentification CarIdentification2
{
   get { return _carIdentification2; }
}

if (!string.IsNullOrEmpty(transaction.TaxNumber))
{
   _carIdentification1.Value = transaction.TaxNumber;
}
 
if (!string.IsNullOrEmpty(transaction.RegistrationNumber))
{
   _carIdentification1.Value = transaction.RegistrationNumber;
}
 
if (!string.IsNullOrEmpty(transaction.NewRegistrationNumber))
{
   _carIdentification1.Value = transaction.NewRegistrationNumber;
}
 
if (!string.IsNullOrEmpty(transaction.NewTaxNumber))
{
   _carIdentification2.Value = transaction.NewTaxNumber;
}

So a quick glance at the code and you see that 3 of the 4 statements are using “_carIdentification1” then the last one sets _carIdentification2. So without putting much thought into this, I would expect one of the first 3 should actually be using _carIdentification2. But what does 1 or 2 actually mean? Are they representing TaxNumber and RegistrationNumber, or is it representing Old and New?

If it is the former, I think _carIdentification2 would represent TaxNumber so the first one was wrong.

If the latter, then I think _carIdentification2 would represent New so the third one is wrong.

I do know that the concept of TaxNumber was added into the system later on, so most likely it would be: “_carIdentification2 would represent TaxNumber”

However, the developer, Govind had changed it to the following:

public CarIdentification CarIdentification1
{
get { return _taxNumber; }
}
 
public CarIdentification CarIdentification2
{
get { return _registrationNumber; }
}
if (!string.IsNullOrEmpty(transaction.TaxNumber))
{
   _taxNumber.Value = transaction.TaxNumber;
}
 
if (!string.IsNullOrEmpty(transaction.RegistrationNumber))
{
   _registrationNumber.Value = transaction.RegistrationNumber;
}
 
if (!string.IsNullOrEmpty(transaction.NewRegistrationNumber))
{
   _registrationNumber.Value = transaction.NewRegistrationNumber;
}
 
if (!string.IsNullOrEmpty(transaction.NewTaxNumber))
{
   _taxNumber.Value = transaction.NewTaxNumber;
}

So they have named the fields _taxNumber and _registrationNumber which is much clearer. Notice though that they have deemed carIdentification1 to be _taxNumber which means they are saying the second, third, and fourth statements were wrong!

An additional thought: if a “transaction” comes into the system, what fields do you expect populated? if RegistrationNumber and NewRegistrationNumber are populated, then _registrationNumber will initially be set to RegistrationNumber, then instantly changed to NewRegistrationNumber! So I’d say this logic needs to be if/else so it only sets _registrationNumber once and _taxNumber once.

I point this out to the developer, but I find a large number of Indian developers just seem to respond with a sequence of meaningless words and you have no idea what they are thinking.

Me:

if NewTaxNumber and TaxNumber are specified, won't it just use NewTaxNumber?


Govind:

taxNumber identifier has to updated with new taxNumber value for transactions


Me:

if it is intentionally written this way, it should be if/else

and if it wasn't intended that way, then there is a bug here, similar to the one you are fixing

They ignored me and checked it in. I don’t know if just the original code was mad, or if this new code is mad too. Let’s hope it doesn’t cause more problems when it goes live.

4 Year Special: Derek

It’s been 4 years since I started writing this blog! how time flies.

The original inspiration for the blog was to tell stories about my incompetent colleague Derek. However, he left shortly after I started writing – so I didn’t end up writing many blogs about him.

After going through some old chat logs I found, I have dug up some extra stories, code snippets, and quotes.

“There’s a lot of duplication here, surprisingly. Functionally it is probably quite a big bag of spanners”

Derek

Hypocrite

When I first started working with Derek, I was telling my manager about the things he was doing and saying. My manager asked me how he is getting on to see if I had more stories to tell.

Matt:
How is Derek getting on?

Me 10:36:
on holiday, but he has spent a week or so sorting out some buttons. He has assigned two items to himself. 

So he complained that the User Stories were too big, so we have split them up a bit more. Yet he then assigns 2.

He hasn't sorted out all my comments I made on his last fix.

After Derek was making excuses that he wasn’t productive due to the size of our planned work, we split it into smaller chunks for his benefit. He then picks up multiple chunks in addition to also needing to rework his previous work. It seems we didn’t need to split the work up at all.

Redefining a boolean

bool hasSelectedReports = SelectedReports.Count != 1; 
bool noReportSelected = SelectedReports.Count == 0;

A boolean or “bool” for short represents true or false. Since there are two states, if you created a variable called “hasSelectedReports” you can invert it using the ! (not) operator, so saying !hasSelectedReports reads as “has no selected reports”. Derek decided to create a variable for the inverse, but then he got the logic wrong: If Count is 0, both hasSelectedReports and noReportSelected are true!

“I’m working from home today. Sorry I haven’t emailed sooner, I had some problems initially remoting in as my computer at work had a dicky fit.”

Derek

Pointless Conversion and check

return !String.IsNullOrEmpty(ConceptId.ToString()) ? ConceptId.ToString() : null;

ConceptId is defined as a long ,which is a number. It can represent any whole number between -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. It cannot be null/empty. By default it will be 0. Here, Derek converts the number to text (string) so he can take advantage of the IsNullOrEmpty method. If it is somehow null, we will return null… apart from if ConceptId was somehow null, calling ToString on it would crash.

If he really did want the text presentation of the number, this code should simply be

return ConceptId.ToString();

Pointless Conversion and check: part 2

if (!String.IsNullOrEmpty(conceptId.ToString()))
{
   	_conceptIdInt64 = Convert.ToInt64(conceptId);
}

Again, Derek decides to convert the number to a string so he can check it isn’t null. It is always true, so then goes into the code block. He then converts the long to an Int64. A long IS an Int64. It’s just an alias for it. So the code is completely pointless, other than assigning a simple variable.

“I may have taken too much artistic licence”

Derek

If at first you don’t succeed, try again

public Entity GetEntityFromConceptId(long conceptId)
{
  	return 
  	  	(EntityFinder.TryGetEntityFromId(conceptId)) != null 
  	  	? 
  	  	EntityFinder.TryGetEntityFromId(conceptId) 
  	  	: 
  	  	null;
}

When methods are named “Try”, it means that it might not be successful and can return “null”. In this case, TryGetEntityFromId returns null if you pass in a conceptId that doesn’t exist. So you should check for null, and do something different, maybe display an error message to the user. Although Derek decides to simply try again. Why would it work the second time?

(frustrated tone) I’ve spent 5 days on the same work item

Derek after 13 days trying to implement a simple feature. For the previous 3 days, during the daily stand-up meetings, he was saying he is “nearly finished and it’s pretty much ready to check in”

Conclusion

As you can tell from these stories, Derek made some bizarre choices and generally seemed deluded. There were times where I thought he had to be trolling, because there is no way you would write so much redundant code. I understand you can have bad days and have the occasional moments of madness, but it was a daily occurrence with Derek. For more stories about Derek, click the Derek tag.

YAGNI / This Code Could be Useful In The Future

I remember the days of being a junior software developer where you write code because “it might be useful at some point in the future”. The thing is, this means you end up writing way more code than is necessary, and could lead to overcomplicated designs.

There is a concept called YAGNI: “You Ain’t Gonna Need It”

Swizec sums this up well here with a strong hint of sarcasm…

And that brings us to the other YAGNI: Building stuff you think you’ll need before you need it.

You think of a wonderful abstraction. A wonderful little feature that’s gonna help oh so much in the future.

You build it.

Months pass. You tweak the code here and there. Nobody’s using it, but you have to maintain otherwise it’s gonna break. Whenever you implement a feature, you have to think about how it impacts this code you’ve got.

It’s slowing you down. Making you think. And you aren’t even using it.

But one day! One glorious day, your PM gives you a task. A task from the gods. You are about to use this code you predicted 6 months ago!

You are a god of prediction! You knew it all along!

https://swizec.com/blog/dry-is-a-footgun-remember-to-yagni/

A simple example I came across recently was this:

public static bool IsModalDialogOpen()
{
  Boolean isModalDialogOpen = Application.AnyModalDialogOpen;
  if (isModalDialogOpen)
  {
     throw new DialogException("A dialog is already open");
  }
  return isModalDialogOpen;
}

So they have written a method that checks if a dialog is already open. If there is, then an exception is thrown (depending on their error handling, it would either crash, or display some kind of error to the user), otherwise it returns false. Since this is new code, the only places that called this method was in his new code. However, he never checked the return value of the method. Since it is defined as a Boolean/bool then it could only return true or false, but his logic only ever returned false… and all the calling code never used the value. Our exchange went like this:

Me: Why do we have a return type for this method when the return value is not used above?
Developer: If any one in future want to know the value it will be useful because of that I have added.
Me: https://rules.sonarsource.com/csharp/RSPEC-3241
       Also: it will never return true
Developer: OK, removed the return type.
Me: It needs an appropriate method name now

So at first he says it will be useful in future – which is definitely false – YANGI definitely applied in this situation. Then when he agreed that the boolean value wasn’t useful, he left the method name as IsModalDialogOpen which implies it should return a boolean. In the end, we were left with this:

public static void ThrowExceptionIfModalDialogOpen()
{
   if (Application.AnyModalDialogOpen)
   {
     throw new DialogException("A dialog is already open");
   }
}

In another recent change, there was this humorous exchange between two Senior Developers, George and Paul:

George
This looks like dead code...

Paul
Yes on the off chance it was used, we now redirect to the new stored procedure

George
How did you test it?

Paul
We spoke to the Data guys and they said it wasnt used, so no testing required. I am sure.

George
So you leave it in case it is used, but don't test it because it isn't?

Paul
Yes sort of. In case this was ever ressurrected the change was there for them to use

George
It won't be

So Paul had been migrating a database “into the Cloud”, and so went through all the features where that old database was used. He found some old code which he believed to be obsolete, but made the change there anyway. George then flags it up, and Paul says he changed it just in case it was actually used, even though other developers also agreed it wasn’t used. He didn’t test it because it wasn’t used, but the code is there if someone decides it should be used, but it might not actually work because he never tested it.

You could make a variation on that Swizec quote at the start of the blog. It’s extra code that you have to maintain and slows you down, but the day that code is finally used; Paul has saved people time! (Apart from all the developers who will end up looking at the obsolete code in the meantime).

Coding Tales #1

I haven’t written many blogs talking about code for a while, but my notes are stacking up; so time to post some blogs discussing code! I’ll try to explain them the best I can for the non-programmers, or those that find C# hard to understand.

There was one bug fix where the developer took several attempts to fix it. Taking several attempts to fix a bug illustrates that you either – don’t have a good understanding of:

  • the requirements,
  • or the codebase that you are changing.

If you make a change and it introduces a bug, then you make another change and it introduces another – instead of playing “whack-a-mole”, it’s worth considering if you are actually writing the correct fix. From my experience, you can often just take a step back, ask for help, and come up with a much simpler, and less risky fix.

In one of their changes, I saw evidence that it was possible that they didn’t understand what they were changing, or taking care in their work.

case ParentRecordRelationship.Grouped:
//when a record is grouped, the associated text is NOT altered - so this method should not be called
   throw new ArgumentException(string.Concat("Invalid parent record relationship supplied: ", parentRecordRelationship));

So we have a comment, explaining that the existing requirement is that the “associated text” is not modified, so it is completely invalid to call this particular method which would change the associated text (other code not shown). Therefore, we throw an exception which would cause the system to show an error to the user.

The developer changed the code to this:

case ParentRecordRelationship.Grouped:
    newAssociatedTextValue = CreateAssociatedTextValue(string.Format(_groupedWithFormatText, parentrecordOriginalTerm));
     break;
//when a record is grouped, the associated text is NOT altered - so this method should not be called
//throw new ArgumentException(string.Concat("Invalid parent record relationship supplied: ", parentRecordRelationship));  

So now he has allowed the text to be updated in this situation. Unless that really is a change in requirements, then this is the wrong thing to do. Also, instead of deleting the code that throws an exception, he has just “commented” it out (which is why it now shows in green), and also left in the actual comment saying that this scenario is invalid; which now adds to the confusion when the next developer reads it.

To me, that shows that the developer is uncertain it is the right thing to do, otherwise you would remove the comment and the code. Just “commenting” the old implementation out basically acts as a reminder that you might want to restore it. (You could easily just look at the history of the file to see the old code anyway, it isn’t like it is lost if you actually delete it in the first place.)

He also added to the confusion with this change

//take the 1st 12 characters of the format text
itemValueStartsWith = _combinedWithFormatText.Substring(0, 12);

it is now

//take the 1st 12 characters of the format text                      
itemValueStartsWith = _combinedWithFormatText.Substring(0, 9);

This is a good example of why writing comments for simple code is bad. The comment explains a requirement that it should take 12 characters, but you can see that it takes 9.

To me, this shows a lack of care, and lack of attention to detail – which are basic traits that I think are important for good software developers.

Code Analysis: You Cannot Have More Than 6 Parameters

Recently, a software developer refactored some method calls in a strange way, so I questioned him on it. He stated “you cannot have more than 6 parameters”.

This is nonsense, but he means there is a Code Analysis rule that flags this up. The theory is that: if there’s many parameters, some of them are probably related, and so they can be grouped into a common object. 

public void AddUser(
   string forename,
   string surname,
   Date dateOfBirth,
   string addressLine1,
   string addressLine2,
   string town,
   string county)

In the example above, you can group the last 4 fields into an Address object.

When the Code Analysis rule flags your code, I think it’s more of a question of “can you group these parameters together?” and not a “you must never have more than 6 parameters”.

If you think the code is fine, you can simply suppress the rule. If you think it is correct, then you can create a new class to group these variables together (or use an existing class if there is something relevant).

I’ve written quite a few blogs on Developers writing code in strange ways because Code Analysis prompted them to. I think Code Analysis aims to prompt you to write code in a standard way, and avoid common errors/inefficiencies. It seems to have the opposite effect because many developers don’t think for themselves.

Out of interest, I asked a nerdy developer if there really was a limit to how many parameters you can have. So like a total nerd, he tested it out. Apparently, in C#, the max you can have is 65536 parameters in a constructor. Then he gave me additional nerd stats. His file was 4MB of source code which compiled to 1.9 MB exe file.

No idea why 65536 is the limit, and I didn’t spend any time investigating it, or quizzing him further. I just accepted it’s such a large number, you would never reach it. It’s more than 100, so you may as well say there isn’t a limit.

String comparisons

We recently had a controversial bug where our software was showing female-related information for males. Another team was responsible for the code that generates the warnings, but first I had to check we were passing in the correct information.

For males, we were passing in “Male”, “Female” for female, and “Unknown” if the information wasn’t stored. This looked correct, although using text in this way easily leads to bugs as I will soon discuss.

Another point of contention here is that if this is representing Gender, we aren’t providing enough options for today’s culture. More controversially – throughout the codebase, we are sometimes labelling this Sex and other places it is Gender. The user interface labells it Gender, but then the database column says Sex. 😬

This is obviously going to lead to problems if the developer uses this data in a way that wasn’t intended.

I was surprised users hadn’t flagged this as an issue because surely someone would be offended by the limited options and the way we display it. Also, some features of our software could easy display misleading information (just like the bug I will discuss).

So back to the main point of this blog. We were passing in what looked like the correct data, but then I looked at the other team’s code. It was actually worse than I anticipated.

if (sex == "M")
	criteria.Add("Sex = Male");
else
	criteria.Add("Sex = Female");

Whoopsies. This is wrong on so many levels.

First of all, the method call asked for a C# String which is: text of one or more characters. There is a special type called “char” to represent a single character. So if they only wanted 1 character, then they should use the “char” type. However, using char still has similar flaws like I explain in the next paragraphs.

Performing String comparisons is bad because I could pass in “Male”, “male”, “MALE”, “m”, “m” etc, with similar permutations for “Female” and “Unknown”, and I could have also passed in some complete nonsense. What happens if I pass in “Thing”? surely it should throw an error because that is invalid? Instead they treat “Thing” as a “Female”.

This has caused the behaviour we were seeing. We pass in “Male” which seems like the correct thing to pass in, but because “Male” doesn’t equal “M”, the method treats “Male” as “Female”. It also treats “Unknown” as “Female”. Even if they had used char types, we would still see problems when comparing “M” to “m”, and “U” would still be treated as “Female”.

A more robust solution would be to define an Enum type of all the states, then check for certain types explicitly, but throw an exception for anything that doesn’t match the conditions you have defined. If someone adds more cases to the Enum in the future but doesn’t modify your code, your code will cause a crash. At least that highlights a problem rather than causing unexpected behaviour, or causing offence to users or their customers.

It can look something like this. (You can use a “switch” statement if you prefer).

if (sex == Sex.Male)
{
   criteria.Add("Sex = Male");
}
else if (sex == Sex.Female)
{
   criteria.Add("Sex = Female");
}
else if (sex == Sex.Unknown)
{
   criteria.Add("Sex = Unknown");
}
else
{
   throw new ArgumentInvalidException(sex);
}

I was going to suggest that if they had written unit tests, then some of these problems would have been spotted earlier. However, if they were barely thinking about the code they were writing, then no doubt they would come up with unit tests to check that “M” returns “Male” and “F” returns “Female”, and wouldn’t have bothered with other scenarios. You would have thought someone that code reviewed it would have spotted it, but they must have lazily done that too.

More Bad/Weird Code

This is a follow on from my previous blog. I picked out a few examples that were bad changes from their Code Review, but maybe you didn’t think it was that bad in the grand scheme of things. So here is a collection of other weird stuff from the review.

Example 1: TrimEnd

TrimEnd method can be used to remove trailing whitespace, so if you have some text “Example text “, you can trim it down to “Example text”. You can also pass in an array of characters you want to remove from the end as well. So what happens if you create a completely empty array of characters?

return value.TrimEnd(new char[0]);

I had to test it out to see what it actually did. I was thinking there was a chance it could prevent the method from trimming anything at all. It actually does the same as TrimEnd() so removes whitespace only. So the char array is just redundant and looks confusing.

Example 2: The Weird “Any” check

Give a C# programmer a list called “textToProcess” and ask them to check if the list is Empty, and they will write this:

List<String> textToProcess = GetData();
return !textToProcess.Any();

or alternatively:

List<String> textToProcess = GetData();
return textToProcess.Count == 0;

But when you see this, then questions need to be asked.

List<String> textToProcess = GetData();
var empty = new List<string>();
return empty.SequenceEqual(textToProcess);

If you are a Junior and you are reading a book, working through tutorials, or even Google how to check a list is empty; what is the chance you will come across SequenceEqual? I’m sure I only came across that after a year of C# development. Weird choice.

Example 3: Unnecessary Converting

If we have a variable that stores a number. And I say that I want it to be a number. What are you gonna do to it? If you thought “nothing, it is already a number”, then you are correct.

If you thought something else, then maybe you did what this developer did. You could convert the number into text, then convert it back to a number!

This “number” variable is already a “long” aka Int64. They really are converting it to text, then instantly converting it back to the type that it already was. I really can’t emphasise how pointless this is.

Convert.ToInt64(number.ToString())

What’s weird is that I have seen this a few times over the last couple of months. Why are people doing this?

Example 4: Reinventing a private setter

There was a simple, standard property that you see all the time in C#.

public Validator Validator { get; set; }

And he changed it to

public Validator Validator => _validator ;
private Validator _validator  { get; set; }

So now the property is using this newer property syntax that looks weird to me. But really it is a property with a “get”, but no set. Then the second line is named as if it is a field (preceded with an underscore and first letter is lowercase), but has a property definition (get/set). Properties should be public but this is defined as private. This is incredibly weird.

Here’s how it should actually look (just has the word “private” added):

public Validator Validator { get; private set; }

So I comment on the second line:

this is named like a backing field, but you have defined it as a property. Do you need a backing field? couldn’t you just have changed the set to be a “private set”?

Me

Yes we need this backing field and it won’t allowed to set as private set

Developer

I had no idea what he was trying to say here. It definitely can be a private set. Plus there is no backing field, (just that he named it as such), and I had wondered if that’s what he was trying to achieve.

It isn’t a backing field, it is a Property that you have named as if it was a backing field, then the Validator just calls your new property. I think this change means Validator essentially only has a “get”. So surely you can achieve the same thing with public Validator Validator { get; private set; } then remove “private Validator _validator { get; set; }”

Me telling him exactly what to do.

Yes you are correct, but here we never wanted to do any changes in existing flow since it will give huge impact, so that used the different one without touch any of the current flow.

Developer

Just like the example in the previous blog, I tell him he has changed far too much and tell him to revert it/fix it properly, then he starts telling me how he doesn’t want to make more changes than he needs to. Most confusing.

I don’t understand what “existing flow” means here. Your changes have already removed the public set from Validator so no other class can set it. You’ve just done it in a way that people don’t expect by also changing it to call another property (which is named like a field). This is just confusing.

Me

He finally changed it, but it took some effort.

Example 5: Unit Testing

There was a method that returned an object that you could get multiple properties from. However they had created 2 separate methods that you call instead which was inefficient when you want all data.

public string GetForename()
{
   return GetUser().Forename;
}

public string GetSurname()
{
   return GetUser().Surname;
}

When GetUser is a server call that reads from a database, why can’t we just grab the user once, then take the info from it as desired. We shouldn’t duplicate the server call.

User user = GetUser();
string forename = user.Forename;
string surname = user.Surname;

Why not just call the server once and use both properties?

Me

We made 2 method calls to get unit test coverage

developer

What is he on about? I could only see 1 test, it was for a completely different method and it didn’t really test anything.

        [TestMethod]
        public void CreateOutputFromDataTableTest()
        {
            DataTable dataTable = new DataTable();
            dataTable.Columns.Add("ID", typeof(string));
            dataTable.Columns.Add("Code",typeof(string));
            dataTable.Columns.Add("Term", typeof(string));
            
            dataTable.Rows.Add("9908", typeof(string));
            dataTable.Rows.Add("44D6", typeof(string));
            dataTable.Rows.Add("test", typeof(string));            

            var output = Transformer.GenerateOutput(dataTable);
 
            Assert.IsNotNull(output);  
        }

They are creating a table with 3 columns. They are then creating 3 rows but only populating two columns. You can tell by the data that they seem to want to create 1 row but with 3 columns set.

Then after they retrieve the output from their method, instead of actually checking if it has specific data they expect, they are checking if it isn’t null. So it could be any type of object: could have no data in any of the properties, it could have the wrong data in the properties. As long as there is an object it returns, then the test passes. If you are going to write such nonsense, why not make it brief like this:

        [TestMethod]
        public void CreateOutputFromDataTableTest()
        {
            var output = Transformer.GenerateOutput(new DataTable()); 
            Assert.IsNotNull(output);  
        }

Example 5: Incorrect Renaming

public bool IsQueryRunnable(Search search, out string message)

to

public bool IsQueryRunnable(Search searchGuid, out string message)

why was search renamed to searchGuid?

Me

The parameter name in the interface and in the implemented class should be same.

Developer

I think Code Analysis does flag that as a rule. But if the “interface” had the declaration with “Search searchGuid”, then surely the logical change would be to change it to “Search search”, because as I pointed out

but it isn’t a Guid

Me

Conclusion

This person is probably a junior. When I was a junior, I would ask for help when I was unsure – I would never claim incomplete code was production ready. I would also make sure my team members reviewed my code and gave me good feedback before I would want other teams to see my work. This work just seems careless to me.