To Dos

When writing code, some developers like to write notes to themselves in the actual code using “comments”. A special type of comment is the “To do ” which simply starts with the prefix “TODO”. Once the feature is complete, you should have no TODOs left in the code.

In a project, sometimes you may split up the requirements in a way that you may need to leave the TODOs in for now to indicate this will be covered in the upcoming weeks, and maybe another team member will write the code there; removing the TODO when complete.

When a feature is released, you should definitely have no TODOs there. For some reason, this wasn’t the case a few decades ago. There’s still loads of TODOs in our legacy codebase. It seems a common occurrence though when you read what other codebases are like.

/// <summary> 
/// TODO: Whoever wrote this please document 
/// </summary>

What use is a “TODO: Whoever wrote this” style comment?

Surely it’s likely that the original developer won’t update it, and surely it isn’t hard to find out who wrote it if you just check source control. Now the TODO will stay around until someone is bothered enough to delete it, and they probably won’t “document”.

"I'll do it later" is one of these lies you tell yourself. Have you ever seen a kanban card saying "review TODOs"? 🤔 Nope. It never gets fixed. pic.twitter.com/IpUoCBz6Kf
— Vincent Déniel (@vincentdnl) February 26, 2021

“I’ll do it later” is one of these lies you tell yourself. Have you ever seen a kanban card saying “review TODOs”? Nope. It never gets fixed.
Vincent Deniel

Cory House

Cory House’s advice is the following:

// TODO: Thing we’ll never get around to doing. If it’s worth doing, do it now or open a ticket. Why a ticket?

TODO comments are often forgotten.
A ticket makes the issue more visible.
Tickets can be tagged, categorised, sized, and prioritised.
TODO comments create a second system for tracking work. Now we have no easy way to see our backlog in one spot.

Sonarcloud

The static analysis tool Sonarcloud flags up TODO comments because it obviously means something is missing and needs to be addressed. However, some developers have the mindset to remove Sonar errors by doing the minimum to make the error go away. So to fix a TODO, it’s easier to remove the comment rather than fixing the actual issue (if there even is an issue to fix).

However one developer just removed the word TODO so then Sonar would no longer flag it, but the comment still remains, and nothing is actually fixed.

Me
why delete part of the comment? delete it all if it doesn't add value

Kalya
Yeah, SonarQube identify only TODO text to remove, so that did like. Now I just removed entire comment

Me
Sonar doesn't tell you to remove the TODO text:
"TODO tags are commonly used to mark places where some more code is required, but which the developer wants to implement later.
Sometimes the developer will not have the time or will simply forget to get back to that tag.
This rule is meant to track those tags and to ensure that they do not go unnoticed."

Kalya
Yeah you are correct, But we should not move any incomplete code to master with TODO identification right. Anyway it is there in master, so in this case what shall we do whether can we remove or keep as it is. Kindly suggest. This is also opt for all the incomplete methods as well.

Me
Yeah, new TODOs mean you haven't finished your project.
It's not a big deal to leave the existing TODOs as they are.
If you did want to address them, for each existing TODO, you would really need to think if it is a problem or not. They have probably been in the codebase for years, so most likely are safe to remove the TODOs entirely. If the TODO is in a blank method, then you can just remove the method if you wish. Sometimes the methods may be part of an interface though so would need to remain. But then Sonar will complain if you don't have some kind of comment in it, so maybe just leave those with the TODO comment in, unless you can think of a good code comment to replace it with.
It is possible that some of the TODOs could point out possible bugs or limitations so could be helpful as a standard comment. In that case, you may need to reword them a bit so they make sense .
In general, for any kind of refactoring, you need to consider if it is worth changing. The more changes you make, the more risk: increase testing scope, and more code to review (I think most of my comments have been for Sonar changes). Deleting the TODO comments has no impact on the behaviour.
I tend to make changes just to the methods I am modifying due to my bug fix or enhancement. If the testers are planning on doing a full regression test, then I'll consider making more dramatic changes.

This is very concerning to hear

On a code review, a Senior Developer, Lee questioned why there was no database changes when the Developer Neil had removed all the related C# server code. Neil replied that he “wasn’t sure how the patching process worked” (despite being here years, and was in a team with experienced developers), and wasn’t sure if there were any backwards compatibility issues to consider.

So what was his plan? just hope it gets past the code review stage unchallenged? Then we would have some obsolete stored procedures, and unused data lingering in the database for years?

I initially thought his claim for backwards compatibility issues was nonsensical but from an architectural standpoint, it makes sense due to how it works in our system. The server code doesn’t call the other’s server; it goes direct. So that means if the old version calls the new version, then it would expect the stored procedures and data to exist. However, for this particular feature there were no cross-database calls at all.

I suppose being cautious and not deleting the data makes sense from a rollback point of view. It’s hard to restore the data if it is lost, but easy to restore the C# code. I have never seen us use this approach though.

The Senior Developer said :

This is very concerning to hear, can you please work with your team lead to understand how our versions are deployed, and if they are unable to answer all the questions, please reach out to someone. We do not support any version changes by default, though there are cases where we do have cross version server/database calls, but these are for specific cross organisation activities.
You can safely remove these columns, update these stored procedures.
There is no value in leaving something half in the system, if it is no longer needed, remove all references, database rows/columns/tables, class Properties, etc.

In my previous blog, I discussed Project vs Domain Teams. This is kinda linked in the sense that specialising in a certain area of the system means you gain knowledge of the functionality and architecture of that area. There would be less chance of this scenario happening where the developer is questioning if there could be backwards compatibility issues. However, he could have also found this information out by raising questions.

This example does cover many topics I have discussed on this blog:

Poor communication
Bad decisions
Funny quote from a senior developer ”This is very concerning to hear”

Domain Teams, Project Teams & Cross-Cutting

In the world of Software Development, there are often differing views on how to arrange teams. Regardless of the approach, people will leave/join over time, but team members need to be replaced and teams need to adapt.

There was a time when we were arranged into teams that were assigned to a Project, then moved onto a completely different one once complete. Any bugs introduced by the projects then get assigned to a “Service Improvement” team who only deal with bugs (and possibly ad-hoc user requests).

Then after a few years, and maybe under a new Development manager, they would restructure to Domain teams where you take ownership of a group of features and only projects related to those would be assigned to your team. Any bugs introduced by the projects stay with the team, which gives you greater incentive to fix them early as possible. People build up knowledge of their areas and become experts.

Then a few years later, we will switch back to Project teams.

There’s pros and cons to each structure, and there’s always edge cases which pose a management problem. Even in a Domain Team, there will be certain features that don’t neatly fit into the groups you defined, or ones that apply to many modules eg Printing.

Sometimes we have called a team that handles the miscellaneous features “Cross-Cutting”. Managers would sell it on being for features like Printing that really are used by many areas of the system, but we all know it becomes a team that gets miscellaneous and unrelated projects. They end up being like the “Service Improvement” team that deals with random bugs, and work no one else wants to do.

Cross-Cutting

There was a meeting where managers were announcing the new Domain Teams and I got assigned to Cross-Cutting. One developer was voicing his concerns about having a Cross-Cutting team. He wanted to point out that Domain Teams are supposed to have specialist knowledge on the Domains but most people that were assigned to their teams had little-to-no knowledge. For some reason he chose my name to make a point.

“What does TimeInInts know about Cross-Cutting?”

Which received a room full of laughter. I’m sure some were laughing at his point, some laughed at his emphasis and delivery, and others probably saw it as an attack on my knowledge. I was probably one of the best people for it really, given my experience in the previous Service Improvement teams.

The whole idea of keeping Domain knowledge in the team only works if there is a true commitment to keep the teams stable over years. However, people will leave the business, some will want to move to a different project to broaden their skills, or people could just fall out with their team members.

Another concern this developer had was with his own team. He was assigned to a Domain team he was the expert on, but was used to working with a couple of developers in the UK. This new team had two Indian developers. They had recently acknowledged the distributed teams weren’t really working so these new Domain teams were supposed to be co-located. But this setup seemed to signal that he was there merely to train these Indians up to then essentially offshore the Domain. Since he was the expert and proud of it, he still wanted to work in that area. But he can’t work on the same software forever.

In the Cross-Cutting team, we had an open slot labelled “new starter” so we were going to get a new hire in. You have to start somewhere, but again, this doesn’t help the teams specialise if they don’t already start with the knowledge.

Colleagues Opinions:

Developer 1:

Me 13:39: what does a new starter know about Cross-Cutting? 
Mark 13:39: sounds more like Cost Cutting!

Developer 2:

It’s infinitely harder to build something if you don’t understand the thing you’re building. Hard to catch issues and make sense of designs if you had no opportunity to learn the domain.

Developer 3:

isn’t one of our major issues is we’ve lost domain expertise for core/bread and butter modules. For any “module”, there’s a combination of what the requirements are/how it should work, and what the code is actually doing. Without “domain teams”/ownership – we’ve lost a large part of the puzzle (how module should work).

Developer 4:

our teams are completely ineffective, expertise has been spread too thin. We probably need to reorganise the department again with who is remaining.

Build stronger teams first that only have one junior-ish person, then have weaker teams helping out where possible. It will be very hard for the weaker teams, but unless we do this, we’ll lose the stronger people.

The weaker teams should be given appropriate projects with longer timescales, and given as much help as possible while ultimately having to struggle their own way, stronger people who put in the effort will begin to emerge from those teams.

Extension methods

Even as an experienced software developer, it is amazing when you discover some really trivial things, or discover some interesting quirk of a programming languages.

I was looking at a Code Review the other week and I saw some code that looked really pointless. It was testing some code throws a ArgumentNullException.

[Fact]
public void LogWarningDetails_WithNullLogger_ThrowsArgumentNullException()
{
	ILogger logger = null;
	Assert.Throws<ArgumentNullException>(() => logger.LogWarning("Test Error Message"));
}

A NullReferenceException is an incredibly common mistake and probably the first problem new developers encounter. If you have a reference to an object, but the object is null, you cannot call instance methods on it.

Therefore if logger is null, then you cannot call LogWarning without an error being thrown.

So on first glance, this test looks like it is testing the basic fundamentals of the C# Programming language. However, this is testing for ArgumentNullException rather than NullReferenceException.

LogWarning was actually defined as an extension method, and this actually does allow you to call methods on null references. I’ve never realised this or even thought about it. It is the case because extension methods actually pass the reference in as a parameter.

So if you have an extension method (as indicated with the this keyword):

	public static bool IsNull(this object x) 
	{
		return x == null; 
	}

This can be called like this:

	static void Main() 
	{
		object y = null;
		Console.WriteLine(y.IsNull()); 
		y = new object(); 
		Console.WriteLine(y.IsNull());
	}

Which would output true, then false. Which illustrates that the extension method does not crash if the reference to y is null, and the logic correctly works by returning true when y is null.

Conclusion:

Understanding NullReferenceExceptions is basically day 1 of learning to code in an Object Oriented Language like C# but I’ve never even considered there is an exception to the rule. A method call on a null reference won’t cause a NullReferenceException if the method is an Extension method!

Atalasoft DPI

We use a software library from Atalasoft in our product to allow users to add annotations to PDFs.

One of our Indian developers posted on Slack to ask a question about a bug he was assigned to fix. It was quite hard to understand what he wanted but it sounded like the quality of the users PDFs were lowered to a point that they were blurry and unusable.

Hi Everyone, Here is my doubt was more of a generic one. In Document Attachment Module, I’m trying to attach a PDF. The attached PDF gets depreciated in the doc viewer.. After analysis came to a conclusion that, the Atalasoft Viewer we are using Document Attachment viewer should pass only with 96dpi(dots per inch).

However in the Atalasoft Documentation itself was given that inorder to increase the quality of the document inside the Document Viewer of Atalasoft we need to pass on the default or hardcoded resolution as attached.

With respect to this have attempting a bug in which need to fix this depreciation not in a hardcoded format.

Is there any way to calculate a PDF file’s DPI through its file size. (Note: Since PDF file was vector based and doesn’t posses any information related to dpi).Can anyone please guide me on this ? Apart from hardcoding and passing on a resolution value.

After struggling with it, another developer started working on it, but then went on annual leave so yet another developer took over. None of them had put much thought into what they were doing because when I asked them to explain the code, they couldn’t seem to. I then googled the code and found it on the Atalasoft website. https://www.atalasoft.com/kb2/KB/50067/HOWTO-Safely-Change-Set-Resolution-of-PdfDecoder

using (var annotateViewer = new AnnotateViewer())
{
    annotateViewer.DataImporters.Add(new Atalasoft.Annotate.Importers.PdfAnnotationDataImporter { SkipUnknownAnnotationTypes = false });                
    using (var pdfDec = new PdfDecoder())
    {
        pdfDec.RenderSettings = new RenderSettings { AnnotationSettings = AnnotationRenderSettings.RenderAll };
        Atalasoft.Imaging.Codec.RegisteredDecoders.Decoders.Add(pdfDec);
        SetPdfDecoderResolution();
    }     
    
    annotateViewer.Open(filePath);                
    var printer = new Printer(annotateViewer);
    printer.Print(printerSettings, documentName, printContext);
} 
 
 
static readonly object pdfLock = new object();

private static void SetPdfDecoderResolution()
{
    int standardResolution = 300;
    lock (pdfLock)
    {
        foreach (Atalasoft.Imaging.Codec.ImageDecoder rawDecoder in Atalasoft.Imaging.Codec.RegisteredDecoders.Decoders)
        {
            if (rawDecoder is PdfDecoder)
            {
                //By default PdfDecoder sets to lower resolution of 96 dpi
                //Reason for PDF depreciation
                ((PdfDecoder)rawDecoder).Resolution = standardResolution;
                return;
            }
        }
        Atalasoft.Imaging.Codec.RegisteredDecoders.Decoders.Add(new PdfDecoder() { Resolution = standardResolution });
    }
}

The code instantly stood out for being convoluted because we are creating a PdfDecoder called pdfDec, then instead of just setting properties on it, we add it to the RegisteredDecoders, then call our SetPdfDecoderResolution which loops through the decoders to find the one we added. If it can’t find it (which surely is impossible) it will add one.

I was talking to a Lead Developer about a completely different bug fix, and he says

“People just don’t think about what they write”
Lead Developer

So I decided to bring up this Atalasoft problem…

He said

When I saw the lock I wanted to ask, “Which Stack Overflow post did you find this on?”
Lead Developer

So I told him they got it from the Atalasoft website!

So they had blindly pasted this Atalasoft code in without thinking. They could just set the Resolution property in the existing code since we already create the object; so already hold a reference to it. If this code can mean we can add multiple decoders (which you aren’t supposed to do), then we could create a method similar to SetPdfDecoderResolution where it checks if there is a decoder or add one if none exists. Then we ensure all the correct properties are set.

They need to think
Lead Developer

I think the problem the Lead Developer had with the lock is that you use lock when you want to guarantee that only one thread is accessing a resource/section of code at any time; but this code wasn’t used in a multi-threaded context. So by blindly pasting in code without thinking, they were adding redundant lines and creating confusion.

The actual fix was just

private const int highDotsPerInch = 300;
pdfDec.Resolution = highDotsPerInch;

But to reach this outcome, it took 3 developers to look at it, then 2 to review it.

Missed Deadline For the Proof Of Concept

As a software developer, you are always given projects without knowing the contractual details involved. However, there was one project that I was originally assigned to do, and was forwarded some documents about the project. In the document, there were some fairly formal documents which included some pricing.

The project was actually straightforward because we already had the functionality for users in England and they wanted users in Wales to use similar functionality. It was the same for the most part, but there was some minor customisation required. So it mainly involved deleting or tweaking a few files to remove the validation based on the country. Then there would be some testing involved to make sure the feature really did work when configured for Wales.

Some Senior Developers and Architects had estimated the project at 6 months which was a bit extreme, and reckoned the cost of development was £442,404, then some miscellaneous costs for “platform, network and migration” which would take the total to £445,620!

On the face of it, that sounds expensive. But when I think of the labour cost involved, where I work, a Junior might earn £25k a year, then Seniors are more like £32k-£45k. So if you have a few developers and testers on a project, with some managers involved, and it really does take 6 months, then the costs soon add up. Then you want to make a decent profit on it too.

I guess the cheeky thing is, the customer might not know what you already have; so you could charge as if it was new but you are just recycling/reusing existing code.

The end result is the same for the new customer isn’t it?

What I didn’t understand in the document is that there was a line that said:

“The requirements described within this CCN must be delivered by January 2024 in order to support a proof of concept with a limited number of users in a live environment. Once the proof of concept is complete, an implementation plan will be defined by the programme team to determine the pace of the national rollout, to be complete by January 2026.”

My question is, does it make sense to create a proof of concept (POC) that works well enough, but then have 2 years to actually complete the work?

Well people don’t have any experience of what they are suggesting so are just making it up. I agree though, if you have a proof of concept you’re kind of almost there. Depends on how hacky the POC is I suppose
Robert (Senior Developer)

Even more confusing is that we didn’t deliver the POC by January, but we did deliver the completed feature by the end of March.

Performance Tales: Out of Memory Fixes

We have an area of our system that is a major pain for memory usage. We allow users to create what we can generically refer to as “Resources” and new users will then download the entire set of them, which are then cached locally.

The initial download is very large, then they are loaded into memory the next time the application loads. Most of the time, it is on-demand, but it can be slow and be very memory consuming.

Another problem is that due to various bugs, sometimes these resources can be missing/corrupted and have to be downloaded again.

Over time, the area of code has been cobbled together by developers that don’t really understand the system and so has perpetuated the inefficiency and bugs, which then becomes an endless cycle of making the system worse.

There was a big push to improve this area of the system, but no one has learned their lesson, so many juniors got assigned to fix the problem.

When it comes to code reviews, code can be surprising, and the surprise either comes from the fact that the developer is significantly smarter than me, or maybe significantly dumber. So sometimes I find myself looking at it and wonder if it really is bonkers, or some genius understanding of code that I need to learn. So sometimes it’s best to ask your colleagues to check your understanding.

I don’t remember seeing a cast in a property before:

public IEnumerable<MissingResource> MissingResources
{
    get { return _missingResources; }
    private set { _missingResources = (List<MissingResource>)value; }
}

So it’s either incredibly smart, or incredibly dumb.

“That cast is mental!

You could set it to anything that implements IEnumerable<MissingResource> – but it better be a List<>
Dave (translation of what the code is saying)

Is the following just a lack of trust that .Net won’t clear the old objects? To me, this code makes it seem like there is a bug which they are working around, or are just going wild nulling everything out to save memory.

public void ClearData()
{
	NewResources = null;
	ExistingResources = null;
	MissingResources = null;
	SkippedResources = null;
	NewResources = new List<IResource>();
	ExistingResources = new List<IResource>();
	MissingResources = new List<MissingResource>();
	SkippedResources = new List<IResource>();
	IndexedResources = new List<Guid>();
}

~~trust~~ understanding
Dave

Does that finally block do anything? it’s a local variable so should be marked for garbage collector at that point anyway

finally
{
	bulkResources = BulkResource.Empty();
}

Yes it does something.

That something is worse than doing nothing!!!!

the finally allocates another instance and loses scope of the current one, meaning there are 2 things to GC now
Dave

I do wonder if sometimes they don’t really know what you are asking but just change stuff anyway. So after I point out their use of null didn’t do anything good, we now create some empty lists and clear them if they are not null. (which they aren’t null, and are definitely empty because we just created them).

public virtual bool PrepareItemsForImport(ImportProcessParameters parameters)
{
	DialogService.SetProgressFormText("Preparing to import...");
	_newResources = new List<IResource>();
	_existingResources = new List<IResource>();
	_missingResources = new List<MissingResource>();
	_skippedResources = new List<IResource>();
	_indexedResources = new List<Guid>();
	ClearData();
	_importStartDateTime = DateTime.Now;
	_mappingInformation = RemappingService.MappingIdentifiersForOrganisation;
	return true;
}

public void ClearData()
{
	NewResources?.Clear();
	ExistingResources?.Clear();
	MissingResources?.Clear();
	SkippedResources?.Clear();
	IndexedResources?.Clear();
}

Does “ClearDataInViewModel” do anything? You call it right before the view model goes out of scope and is eligible for garbage collection anyway?
Me

using (var dialogService = new DialogService())
{
    var viewModel = new ImportDetailsDialogViewModel(dialogService);
    viewModel.InitializeFromImportProvider(importProvider);
    var dialog = new ImportDetailsDialog();
    dialog.DataContext = viewModel;
    Application.ShowModal(dialog);
    viewModel.ClearDataInViewModel();
}

Remember what the point of this work was. It was to reduce memory leaks, and also improve performance in other ways (fixing bugs in the caching, reduce server calls, remove redundant code). What they have done so far is to add more redundant code and show a complete lack of understanding how/when the garbage collector in C# works and runs. The garbage collector is the way that memory (RAM) is freed up.

public IEnumerable<TemplateHeader> GetMobileTemplateHeaders()
{
	List<TemplateHeader> headers = Retrieval.GetMobileTemplateHeaders().ToList();

	return headers;
}

The above code was changed to this:

public IEnumerable<TemplateHeader> GetMobileTemplateHeaders()
{
	IEnumerable<UserTemplateDefinition> mobileUserTemplateDefinitions =
		Retrieval.GetMobileTemplateHeaders();

	IEnumerable<TemplateHeader> mobileTemplateHeaders =
		mobileUserTemplateDefinitions
		.Select(
			template =>
			new TemplateHeader(
				id: template.Identifier,
				title: template.Name));

	return mobileTemplateHeaders;
}

Me
Retrieval.GetMobileTemplateHeaders doesn't seem to return TemplateHeaders anymore

Jaz
Fixed this

Me
You are still taking the output from a method called GetMobileTemplateHeaders and converting them to TemplateHeaders. Seems like the method should be renamed, or the return type changed

Jaz
It is returning template headers enabled for mobile. So it was named as GetMobileTemplateHeaders.

Me
This was the code before. It's of type TemplateHeaders
List<TemplateHeader> headers = Retrieval.GetMobileTemplateHeaders().ToList();

This is the code now
IEnumerable<UserTemplateDefinition> mobileUserTemplateDefinitions = Retrieval.GetMobileTemplateHeaders();
It isn't of type TemplateHeaders
but you want TemplateHeaders. So you then take the output of Retrieval.GetMobileTemplateHeaders and convert it to TemplateHeaders, storing it in a variable called mobileTemplateHeaders.

The code looks strange to have a call to GetMobileTemplateHeaders then the line straight after it creates a variable called mobileTemplateHeaders. 

Surely we expect the code to be more like IEnumerable<TemplateHeader> mobileTemplateHeaders = Retrieval.GetMobileTemplateHeaders();?

Jaz
Change done.

Another developer pointed out they had introduced another inefficiency by grabbing ALL resources and not just the ones they were interested in. So they aimed to cut down memory usage but actually managed to increase it!


Gary
Are you sure you want to do a get bulk resources to only just get the templates out?

You are getting all types of resources ~20k+ items etc to only throw the majority of that data away?

Jaz
Checked with the team and changed the approach to get templates only

Conclusion

It is very easy to understand why this particular area of the system is a massive problem area. If you tell the developers to look into improving performance, they just end up changing random bits of code and hope it somehow works. Then when it is a half-decent change, they won’t put much thought into the naming, so then it’s hard and confusing to read.

What we need to do is actually assign some smarter developers to the project; ones that understand how memory leaks can occur, look at the number of resources being loaded at certain points, and analyse the SQL queries to do the initial retrieval.

More Colin

In the early days, I wrote many blogs about Colin, a Senior Developer who was constantly writing bad code, cutting corners, and saying dumb stuff. After going through some old chat logs, I found a few stories that I don’t think I covered in the blog before:

Two Days For Nothing

Me 13:06:
Colin asked me for help. He says he has spent 2 days on his Work Item
No changes were needed; next!

Dean 13:06:
haha what's the Work Item about?

Me 13:07:
he was baffled by everything. He was creating a template with “Last Entry” field and he was like "why isn't it showing anything"
I said "your selected record is empty"

Colin just accidentally checked in a changeset – not just a file, but everything
Me, in dismay

Misleading Braces

Usually you use braces to group together blocks of code like for an If Statement, or Methods. Colin put his logic on the same line as the “if”, but then used a brace under it. So it looked like the brace belonged to the “If Statement” but it actually did not. We weren’t even aware you could just randomly put braces around code like that.

if ((!criterianValueSets.Any() || hasConcepts)) return hasConcepts;
{
//other code here

Me 10:12:
what a sneaky return. I'm surprised you can have a { after a return like that

Dean 10:12:
so what does the { even mean here?
like why is that code block in braces if it's not related to the if statement?

Me 10:12:
just groups together a random bit of code
I guess technically it is the else

Dean 10:12:
so can you do that wherever you want?

Me 10:12:
I was wondering that
you would have thought code analysis would moan even if it is valid

Dean 10:14:
you can, weird, and it affects the scope too... 
is that legacy code where you found it?

Me 10:15:
nope. Colin. always Colin

Node Resolvers

Colin didn’t know about the Accept and Cancel properties on a windows form. His mind was blown

Colin said there were multiple Node Resolvers and he was stripping out Nodes from one of them…then a minute later he says there are only 1 Node resolver and he wasn’t stripping out Nodes. Now Steve is confused because he was calling Colin’s code and was expecting it to strip them out

Open closed principle

Me 15:12:
Colin is making a bit of a fool out of himself, showing a lack of knowledge of programming concepts
I said to him
“do we really need 3 separate methods for this? what about 1 method with a switch?”
So he simply replied “Open closed principle”
So I explained he was actually violating that principle by writing it the way he did. 
“If there was a new Node system, you would have to change this dialog. It doesn't matter if you do my suggestion or leave it as it is. To conform to the open closed principle, surely you would need to pass in an object which can return the string. That way if a new Node system is added, you would add a new Node system object, and this class wouldn't need to be touched. 
Anyway, merging the 3 methods would be neater”

Dean 15:13:
Urrrgh

Me 15:13:
I also slagged off that class he wrote before
“I reckon that the guy who came up with Polymorphism would be in tears if he saw this class.”
 Colin had replied “Is that a complement ? I do not see any problem with it .”

I then emailed him with how to write it, and he now realises I am right

Dean 15:14:
That's good

Bugged Mindset

Colin always gets annoyed when I find a bug in his code.

But when testing miss something he loves it, even though it’s out in live and therefore it looks bad for all of us.

He’s not happy I have logged two bugs for his provenance story

NodeSystemCompatibility

Colin wrote a few similarly named methods but they did slightly different things but it wasn’t clear when to call them, and some returned a different value than what you would expect.

Me 12:45:
CheckNodeSystemCompatibility , IsNodeSystemCompatible
what is the difference?

Dean 12:45:
Lol

Me 12:45:
but don't call IsNodeSystemCompatible for documents, you need CheckDocumentForCompatibility  
and CheckNodeSystemCompatibility has a switch that calls different methods and then negates them
case FolderItemTypes.DocumentTemplate:
  return
    !IsNodeSystemCompatibileForDocumentTemplate(selectedItem);
so if it is compatible, CheckNodeSystemCompatibility  returns false  
I think we should just delete the branch and pretend it didn't happen

Dean 12:50:
Hahahahaha
Why are they overcomplicating things??

Me 12:53:
they want bugs. it's not even unit tested

Dean 12:57:
What?!

Health and Safety

Me 14:08:
Colin has a wrist rest on the windowledge and it has melted

Dean 14:08:
ha

Me 14:08:
the gooey stuff has dripped down towards the plug sockets
bit of a health and safety hazard
meanwhile, he also had a hot chocolate hidden behind his monitors that stunk of sick

Dean 14:10:
nice

Me 14:13:
Colin claims he got that drink this morning
a bit worrying if the machine is dishing out sick

Me 15:22:
Mel reported Colin's wrist rest incident
the handyman dude is here to save the day

Dean 15:23:
thank god

Multiple Attempts

Me
is this correct? ...we can copy folders if there is one item we can copy, regardless if there are loads which it can't copy?

Colin
my mistake. should be the otherway round. Will change this to "!documentItems.Any(di => di.CanCopy)"

Colin
correction. documentItems.Any(di => di.CanCopy.Equals(false)); lol

Colin
documentItems.All(di => di.CanCopy). sorry my brain isn't working.

Call It Twice

Me 16:15:
bool canCopy = NodeSystemHelper.GetActionsWithProvenance(selectedItem: SelectedItem) != null && NodeSystemHelper.GetActionsWithProvenance(selectedItem: SelectedItem).CanCopy;
 
if the returned object isn't null, create it again and check a property
classic Colin
 
Dean 16:18:
Wtf; that's melting my head
i don't see how you can take any enjoyment out of development writing code like that

Christmas has come early

Me 16:18:
seems Colin has left us with 36 failing unit tests
Christmas has come early

Manager 16:19:
want me to get him to sort them out?....

Me 16:19:
are you going down there and punch him in the stomach
Manager 16:20:
gonna stove his head in!!....

No Unit Tests

Me 13:22:
Colin fixed a bug
“Fix bug where protocols created pre-version 1.5 shows N/A rather than the correct value”
On the code review:
“Me
unit tests?
Colin
Not failing”
 
 
oh that's ok then. Just fix a bug and don't cover the missing scenario
WHY DO I WORK HERE?
Dean 13:22:
Why would you not write one??
Me 13:22:
it may fail if you write one 😀

No Root Cause

In one part of our application, we always have problems where you select some data in a grid, then the menu bar refreshes multiple times.

This time, there was a bug involving a menu option not becoming enabled until you hover the mouse over it – which was very strange.

Colin then decides to fix this new issue by adding another call to refresh the menu. Brilliant. It already flickered many times – let’s add another flicker!

The lead developer questions this change, and asks him what the root cause was. “This code is complicated, so I didn’t investigate”. Brilliant, totally lazy.

Luckily another developer stepped in and provided the proper fix.

If you don’t understand how the problem came about, then you could end up adding “hacky” code to make it work. But this just pollutes the codebase with more bad code and can cause more confusion and make the codebase harder to diagnose future issues. Good developers don’t cut corners like that.

Project Aurora & The Strangler pattern

Recently we have had another tech guy join the company who is reporting to the CTO. I find that people in these kind of roles want to put their stamp on things by coming up with a new idea.

He presented his idea in our monthly Tech Meeting. He wants to attempt to address our performance problems by taking traffic away from our main on-premise databases. There’s been some similar ideas recently, and although I’m not great when it comes to hardware, networks and general software/hardware architecture; I am sceptical that these ideas can work.

His idea is that we can replicate the database in the cloud (“the cloud” solves all problems you see), and then the database in the cloud can be used for Read access, whereas Write would still go to the main on-premise databases (then synced up to the cloud).

The Announcement

This programme of work is to move workload away from our primary systems to enable these systems to withstand expected load factors from upcoming initiatives as well as expected growth in usage on our APIs during Winter 2023.

The intent is to run focused cross functional teams in work-streams across the group to deliver this initiative. The approach taken here is to place multiple bets, across multiple teams. The expectation is that not all teams will deliver by September, but enough to bring in the headroom we need.

The programme is intending to free up at least 20% load across our core databases. 

Upcoming aims:
    • Strategic, move read-only workloads to Aurora.
    • Redeploy APIs to AWS, Move to cloud technology, Containerise and Optimise Service
    • Enable use of replica data when ready.
    • Move Appointment Workload
    • Mitigate 8am peak load.
    • Use caching engine on AWS (Elasticache/Redis), mitigate 8.2% of PC DB Load 
    • Reduce load on the DB during day time.
    • Reduce Datacentre and DB load and improve performance
    • Mitigate 6.2% of DB load by optimising how we summarise task counts
    • Proof of concept is Complete, expected to cost £2m a year.

My Conversation With Architect Mark

I think the reason for the replication (as opposed to just moving it all to the Cloud) is that you can’t fully commit to ideas like this. You have to have a rollback plan. So if we find it doesn’t work, is too expensive etc., we can just return to the old way without much inconvenience. I asked one of our Software Architects what he thought of the plan because it doesn’t sound right to me:

Me
doesn't sending data out to another database just increase traffic, and they wanted to reduce it?
Mark
Yes, it will also be delayed, and often broken
Me
no pain, no gain
Mark
they're replicating data, and it's unlikely it'll be used
Me
I don't see how you migrate things. You have to keep them both running until you are confident it works, then bin off the old database. But then in reality you just end up keeping them both for longer than expected
Mark
you then also need cross-database transactions or to be very careful with queries
yeah, that's basically it. Have the same API at both ends, some sort of replicate and transform on the data to ensure it's in both. Persist to both simultaneously, then when all works, turn off the old
Me
The CTO said that “some people say there is a delay, but it is only 5 minutes”. Does that address any of your concerns at all?
Mark
no, this is only the second time I've heard about this, and the first I laughed
I agree with the principle of strangler pattern for migrating, but this isn't migrating
it's keeping multiple DBs 'in-sync'
Me
does that mean you can view an appointment book which is 5 mins out of date, and you try book an appointment, then it checks the real database and is like "no mate you cannot do that"

The conversation between architects

Mark then sent me a conversation he had with two other architects, Andrew and Jon. Mark already had concerns with the “appointment book” example.

Mark
so when this replication system goes down for a few hours, what happens then? I guess the system tries to book appointments for slots already booked, put in requests for items already issued etc.?
seems our business layer needs to be aware of how outdated the original information was, so it can compare something like a changelog number. Sounds like a big challenge to implement correctly

Andrew 11:10
Yes, any write operations will need logic to ensure that cannot happen Mark.
John and I have already called out that Appointments and Orders will have significant challenges with this replication model and have suggested that the initial focus should be on User Profiles, and any historic data, etc.

Mark 11:13
User Profiles and historic data seem just as dangerous to be honest.

Jon 11:15
The idea I suggested these is that you would check the change log on the primary system before even considering going to the replica. If the User had had a recent change (what counts as "recent" is TBC, I suggested 30 minutes) you wouldn't even consider going to the replica.

Mark 11:15
can we implement the strangler pattern properly? set up proper Appointments APIs to use in our datacentre, and AWS.
duplicate the data.
then dual file everything against the APIs? if one fails to file, the other gets rolled back.
we ensure consistency, we can transform the data, and we're using the pattern as intended
Jon, I agree your idea is the right way to do this sort of thing, but it will be adding logic and latency in a lot of places (as well as augmenting every one of our products to be aware of this), and not bringing us forward, but continuing to keep us in the primary data-store model

Jon 11:18
Honestly if the use case for customers looking at their data, then having it a touch out-of-date information isn't as critical as if our actual users sees an out of date view. As a hypothetical Customer who knows nothing about IT, if I viewed my record straight after a consultation
and it wasn't there I would just assume that there was a delay and it would appear later.
When it comes to actual Users viewing the record, it's absolutely critical that they see the up to date view. And when it comes to appointments that's also critical because appointment booking is fast moving, it'd be an awful experience for a User if every "free" slot they booked turned out to be booked minutes earlier.

Mark 11:19
depends, if you've just requested a particular item and the page doesn't update to indicate that, can you continue requesting it?

Jon 11:20
Many of our users (mine included) turned off online appointment booking entirely at the beginning of the pandemic and use a triage system now.
You wouldn’t be able to successfully request duplicate items, because the write would take place conditionally, so if it had been requested already then it'd say no (if designed even
vaguely competently).

Mark 11:22
the write wouldn't come through, but it'd be confusing for the User seeing the prescription still requestable, unless the application has its own datastore of state

Jon 11:22
Yes it would be far from ideal. But the CTO has some ideas about that (having a "recent changes" dataset in a cache that is updated live, and merged with the replica's data.
feels like there's loads of little bits of logic that need 'tacking on' to resolve potentially quite serious incidents. When the correct use of the strangler pattern gets us away from on-premise as primary DB, and moving in the direction we want to go
Yeah, this isn't easy and requires careful consideration.

Andrew 11:30
You are absolutely right Mark - there are a heck of a lot of potential gotchas and ultimately the plan has to be to use the strangler pattern, but at the moment we are looking at a rescue plan to put out some existing fires in the data centre and to handle predicted significant increase in load that will hit us in the Autumn. Everything that you have flagged is being considered.
The only fall-back plan that we currently have is to spend nearly £4m / year on additional SQL Server readable secondaries (on top of having to pay an additional 12% on our existing SQL Server licences thanks to MS hiking their prices) and nobody has the appetite for that.

Closing Thoughts

I don’t know what the Strangler Pattern is, so I’ll add that to my reading lists. However, it seems that even with my limited knowledge of architecture, our Software Architects have similar concerns as I do. There’s been plenty of ideas that the CTO (or similar level managers) have quickly backtracked on due to not consulting people who have knowledge on whether their idea is actually logically sound. I’ll keep my eye on this idea to see how it develops.

Development Environments

This blog is basically stealing “Healthy Software Developers” explanation of “Development Environments – Isolating Customers From Your Changes”

Introduction

“Development environments” allow software engineers to view a particular version of the software and control who has access to it.

If you go back about 20 years ago, there was much less maturity with how changes were rolled out to customers, and every company had a different process. In some companies, developers would just immediately roll them out, straight into production. This might work for the most part if you have a very small and trusted team without the process or contractual requirements. These days, the standard is that most companies need a minimum of three separate environments when developing software.

Even where I work, people used to tell me it used to be much less professional, and without as much legal scrutiny – allowing them to drop DLLs onto a server to quickly fix a bug for a single customer. Now there’s all kinds of people who need to sign it off, and 99% of changes are released as an official version to multiple customers.

Development Environment

The main development environment is a place where an individual engineer can actually work on the software in isolation from any other stakeholder such as a Tester or Customer. Instead of making changes directly to a live website/application, the changes can be run on a local copy on the developer’s machine. The developer can begin looking at the impact of the changes he’s making, and will use test data. Any problem introduced only affects the single developer.

It could be as simple as a copy of a database and a version of the website. A more sophisticated program may have even more dependencies, so need multiple websites or services configured – so you could leverage modern technology like “cloud services” or “containers” to easily deploy this setup. The decision of how this looks depends on the complexity of the software architecture, team size and process.

Staging Environment

The “User acceptance” test environment, sometimes known as a “staging environment”, is another copy of the application, but developers aren’t making any changes to it, and it is also using test data. The purpose of this environment is typically to run exhaustive testing to find any issues before deploying the changes to the end users.

When companies rely on manual testing, and don’t have modern “continuous delivery” processes, an unreleased version may sit in the staging environment for an extended period of time. In a more automated environment, automated tests can be run, and the version easily deployed to customers if passed. This could take as little as fifteen minutes to an hour depending on the tests, software complexity, and process involved.

Production environment

The Production environment, or Live environment, is where your customers use your product; so the actual website or application.

Demo environment

You can have other environments like a Demo environment. This is another copy, but not used by Developers or Testers. The version can be “work in progress”, or a version that is deemed to be ready; but the purpose is to show customers upcoming features and to gather feedback.

Capacity testing

You could create a special kind of test environment for capacity/load testing. This is meant to stress test the software and see how it’s performing under heavy load. Systems can have increased traffic at specific times of the day, e.g. users all logging in at 9am to start the day, increase in website hits during lunch breaks, more shopping traffic during Christmas period etc. If users are paying to use your service and it goes down; it’s a huge problem. If you sell a product/service on your website and it goes down; then it can cause lost sales. So performing this testing can be essential.