Self-Assessment

Recently, we were filling in our forms for the end of year performance reviews. We have tried all kinds in the past, but have settled on something simplistic in recent years. It’s basically structured around open questions of “what went well?”, “what didn’t go well?”, “What have you learned?”, “What would you like to learn?”. 

Since we had already just evaluated ourself, it was  a surprise to get an email directly from the CTO wanting us to evaluate ourselves.

Hope you are well. 

We are currently conducting an assessment exercise across our Portfolio to establish our strengths and areas for improvement, with the goal of growing our capability as a department and to drive to our long term vision of moving to development on our new product.

To facilitate this process, we have developed an assessment questionnaire that will help us understand your capabilities and your career trajectory.

Could you please complete this form by selecting the option that best reflects your current capability or skill.

It’s an unexpected email, states urgency, and contains a suspicious link. All the hallmarks of a phishing email. I waited for a colleague to click the link before clicking mine. Given that it asks similar questions to what is on our performance review, as well as many others that are specific for our job role; why wouldn’t they just standardise the review process in order to get the information?

Clicking the link loads up a Microsoft form with Employee ID and Name filled in with editable fields but the question says “Please do not change this”. My name had double spaces in it which was really annoying. What would happen if I did correct it? Does Microsoft Forms not allow you to have non-editable fields? Seems a weird limitation regardless.

The questions were labelled with the following categories:

Delivery, Code Quality, Problem Solving, Accountability, Technical Proficiency, Domain Proficiency, Cloud Knowledge, New Gen Tech Stack Proficiency, Joined Up, Process and Communication, Innovation. 

I really didn’t like the way the questions were written. There are 5 answers labelled A-E, but C is often written to sound like a brilliant option when you would expect that to be average. B and A just sound like behaviour reserved for the Architects/Engineering Managers/Principal Developers.

Given that the answers just seem to link directly to your job role, then it reminded me of those online quizzes where it is gonna decide what TV Character/Superhero you are, but you can easily bias your answers because you can see exactly where it is going. In this case, this assessment just seems like it is gonna rank you Architect, Expert, Senior, Junior based on your answers.

Some of the wording for the lowest answers seem like a strange thing to admit.

“Only engages in innovation efforts when directly instructed, showing a complete lack of accountability. “

Why would you admit to showing a complete lack of accountability? Most people probably don’t “innovate” but selecting an answer with “showing a complete lack of accountability” seems crazy.

So given that some answers are never gonna be selected because it’s a difficult thing to admit, and given some answers were clearly based on your job description; then people would just select answers based on what they SHOULD be doing, rather than what they ACTUALLY do. So therefore, it’s a pretty pointless survey. Also there is bias that it was given during the review period so people would suspect it would be used to decide pay-rises and promotions rather than just for some team reshuffle. 

This one on Code Quality is weird because B and C seem similar in standard, but then when you read D, it sounds like you admit you are an incompetent Software Developer.

Code Quality 
(cq.a) Established as code guru and plays a key role in shaping optimal code quality in the team through effective reviews, acting on insights from tools, identifying and resolving inefficiencies in the software and process.
(cq.b) Effectively uses insights from tools like Sonarcloud and influences team quality positively by enforcing standards and showing an upward trend of improved quality and reduced rework.
(cq.c) Upholds the highest standards of unit testing, coding practices, and software quality in self-delivery and ensuring the same from the team through effective code reviews.
(cq.d) Rarely identifies refactoring opportunities, misses critical issues in code reviews, and struggles to positively influence the team's approach to code quality.
(cq.e) Engages minimally in code reviews, allowing issues to slip through; unit tests are skipped and/or yet to begin influencing the code quality of the team.

This one seems applicable to only the top people, or ones that love the limelight and want attention from the managers.

Joined-up
(ju.a) Designs personalised learning paths for team members to ensure comprehensive skill development.
(ju.b) Takes ownership of training needs, seeking opportunities for personal growth. Takes the initiative to identify advanced training opportunities for skill enhancement.
(ju.c) Demonstrate robust team communication, encourage team to contribute in weekly Lunch and Learn sessions, actively recognising peers, support juniors wherever needed. Be active in recruitment.
(ju.d) While excelling as an individual contributor, there is an opportunity to engage more with team members by sharing ideas, seeking input, recognition and offering support in team/organisation initiatives
(ju.e) Need to start taking on a mentoring role by sharing knowledge, providing guidance, and offering constructive feedback to the juniors help them grow and succeed.

I think it is difficult to make meaningful surveys or assessments, but you need to put some thought into the value, and the accuracy of the results.

DataGrid & The Custom Row Selection

We have one module where you select rows from a grid, then can perform actions on multiple entries. The first column in the grid contains checkboxes. So whatever is ticked is what the actions should apply to. However, if you check the box, it also highlights the row. So there are actually 2 things visually that show the user what is selected.

Clicking anywhere on the row should highlight the row, but since highlighting and the checkbox shows what is selected, then we need some custom code to also check the box when the row is highlighted.

My team had made a tweak to the underlying control, which had broken the checkbox being ticked when the row is selected.

When we recreated the bug, we realised that – because there’s 2 ways of selecting (highlight and checkbox), when you click buttons to perform actions, it actually checks against what is highlighted. So even though we had introduced a bug that it no longer checks the box, it doesn’t actually cause any problems at all because the checkboxes are merely a visual thing.

A simple fix would be to just remove the column since it is redundant. Then a lot of code could be cleaned up.

“One thing we have noticed is the tick box isn’t actually needed, highlighting the column gives you all the same functionality as selecting the tick box.” 

Tester

However, our Product team said that even though multiselect behaviour has always existed, many users weren’t aware and so clicked each checkbox one by one. So it sounds like some users like clicking rows, some like using checkboxes, and some like using keyboard shortcuts.

The keyboard behaviour seemed rather strange too and caused extra complications with the code. You can press down/up which selects/deselect the row below/above the row that is currently focussed. However, there is no visual indicator of which row actually has the focus. Other shortcuts include pressing Spacebar which toggles the checkbox on/off. Pressing ctr+up/down jumps to the top/bottom respectively and checks the box (but if it is already checked, then it doesn’t uncheck it; which is inconsistent with up/down without the Control key). You can also press ctrl+a which selects all rows, but you cannot deselect all rows.

It really illustrates how something basic can be complicated to make all users happy. Then the more custom code you add, the more likely there’s bugs and other inconsistencies.

I was noticing inconsistencies with the shortcuts. So when I had implemented my fix, I was convinced I had broken them.

private void ListData_KeyUp(Object sender, KeyEventArgs e)
{
	if (listData.CurrentCell == null)
		return;
	
	if (e.KeyData == Keys.Down || e.KeyData == Keys.Up)
		HandleGridRowSelection(mouseActionOrSpaceKey: false);
	
	if (e.KeyData == Keys.Space)
	{
		_isSpaceKeyPressOnCheckBox = listData.CurrentCell.ColumnIndex.Equals(_checkBoxColumnIndex);
		HandleGridRowSelection(mouseActionOrSpaceKey: true);
		_isSpaceKeyPressOnCheckBox = false;
	}
	
	if ((e.Modifiers == Keys.Control && (e.KeyCode == Keys.A || e.KeyCode == Keys.Up || e.KeyCode == Keys.Down)))
		HandleCheckBoxHeaderClick(combinationKeys: true);
}

I thought I had broken the functionality because sometimes I saw it highlight all rows but it didn’t select the checkbox. Then when I was looking at the code again (which I hadn’t modified), I noticed it was called from Key Up. The ListData_KeyUp code would expect you to let go of A whilst still holding down Control. Although most people would do that due to their hand positioning, sometimes I was finding I was basically “stabbing” the keys so was releasing them roughly the same time. So in the times I had actually released Control first, then the IF statement doesn’t pass and therefore the checkboxes aren’t selected. I think the standard implementation is to check OnKeyDown and not UP.

The Feature Flag Rant

When adding new features to software, you can add a Feature Flag. If set to true, it uses the new feature, false and it doesn’t. This allows a quick roll-back feature by tweaking this value rather than releasing a new software update. However, it makes the code more complicated due to branching paths.

When all users are now using the new feature, when do you remove the code? Obviously it should be removed once all users are switched over and happy with the new functionality, but the work needs to be planned in, and what is the urgency? Project Managers will want new projects that add value, not deleting redundant code.

One of our most experienced developers posted a rant about feature flags. He pointed out there was no guidance on when to use feature flags. Do all new features get feature flags? What if it depends on a feature that already has a feature flag? Do Software Testers test each combination to make sure all code paths are supported? Is it clear which configurations are deployed on live since this should have priority when it comes to testing? By default, our Test Environments should match the config of a typical Live Environment. However, we often find that the default is some configuration that is invalid/not used.

It’s not always possible to “roll back” by switching the feature flag off. This is because to implement the change, you may have needed to refactor the code, or add new database columns. Changing the feature flag back to “off/false” just stops some new code being called, but not all new code changes (the refactored parts). So if the bug is with the changes even with the flag off; then it is still a problem.

It was also discussed that some people used our Configuration Tool for actual configuration and others were using them as Feature flags, and maybe we should have separate tools for Configuration and Features.

Feature flags cause maintenance problems. It needs to be tested on/off when implemented, then if you want to remove it, then that needs to be tested too. If you leave it in, then it’s always going to be questioned if code in that area is used/needs testing. How do you prioritise removing the code; does it belong with the team that initially created the feature? What if the team has moved on, or split?

Another developer brought up an example of how a bug existed in two places but the developer that fixed the issue was only aware of one path, and didn’t know about the other which required a feature flag to enable.

He also questioned if it is more of a problem with our process. Other companies may have quicker releases and are more flexible to rollback using ideas like Canary Deployment. Our process is slow and relies on “fix-forward” rather than rollback.

Things to consider:

  • What actually gets feature flagged?
  • When is the conditional code is removed from the codebase
  • Effect of the “Cartesian Explosion” of combination of flags on unit tests and test environments

Crowdstrike Struck The World

I heard from a few security podcasts that Microsoft wanted to have exclusive rights to manage the security of the kernel on Windows machines. However, due to the EU’s competition laws, they don’t like monopolies so want an open market of security software. In most cases, competition is good, but this could actually be one area where you do want a closed system. The more companies that have control in something fundamental as the kernel, then the greater risk of threats.

A kernel driver has very intimate access to the system’s most inner workings. If anything goes wrong with the kernel driver; the system must blue screen to prevent further damage to the user settings, files, security and so on.

Crowdstrike released a faulty update in their software update, which caused the infamous blue screen of death in many Windows systems across the globe. Microsoft must have been fuming, because they knew this wouldn’t have happened with a closed system, and the media kept on reporting on it as if it was a Windows problem. Sure, it only affected Windows PCs, but it had nothing to do with Microsoft.

If I understand correctly, the driver was signed off by Microsoft but the update involved a “channel file” which just contained loads of zeros. So when the driver used it, it had no choice but to blue-screen. It makes you wonder what kind of testing processes they have at Crowdstrike if they can release an update like that.

When I logged in at work, our Group IT announced that some colleagues will be affected by a Crowdstrike problem and would be acting quickly to get people back up and running. It was only a bit later when someone sent me a screenshot of some of our users complaining on X did I realise that it wasn’t just an internal problem. When I went on X, I saw reports of the problem affecting banks, airlines, supermarkets and more; and had a live news page on the BBC. I still didn’t understand the severity of the problem until I saw that Troy Hunt had declared it as one of the severest problems we have ever seen.

Despite Group IT making it sound easy to restore, when I heard others talk about it, I got the impression that it was fairly straightforward to revert the update on a single computer, but when you have hundreds of computers; then it is a problem. In companies where they only have a few IT staff; it is crippling. You may think that people could fix the problem themselves but many people aren’t tech-savvy, and plus, many companies lock down access so you don’t have any advanced features like Administrator mode. 

Furthermore, it sounded like servers “in the cloud” were even more difficult to restore; or it was more cumbersome at least.

Ironically, in recent years, we have moved a lot of our live infrastructure from our own data centres and into the cloud; citing benefits of reliability. However, this problem meant our users were impacted for a day or so; when we could have got them up and running within an hour or so if the servers were still internally hosted. 

Crowdstrike released an update to prevent more machines from being brought down, and had sent customers mitigation steps and tools to identify impacted hosts. The new update wouldn’t fix the broken machines though; that required manual fix involving booting into safe mode, locating the dodgy file, and removing it.

Companies purchase security software to prevent system outages, and causing a global system outage is a massive PR blunder for Crowdstrike and security software in general. It’s gonna be tough rebuilding trust, but many of the every-day people will probably blame Microsoft because that’s the name that was initially stated in the media.

It must have been brutal for the upper management, and a disaster when they turn up fatigued and under pressure on live TV.

Troy Hunt documented the story as he learned more:

Strange keyboards

Recently, I’ve come across some interesting keyboard designs.

Optimus

Optimus Maximus Keyboard review (Cherry ML)

This one which has customisable screens under the main keys. Very pricey and very gimmicky

Emoji

 Logitech Pop Keys keyboard review: form over function – The Verge

This one has a selection of emoji keys. For someone that loves emojis, it sounds like a great idea in theory. However, at work, I tend to react to messages on Slack/Teams and I think this would only add them when you are writing text. Also, they have chosen very generic ones which they think are popular. Although I might use a laughing emoji, I like using more obscure ones based on inside-jokes. So it wouldn’t really work for me.

Lego

There is this Lego-like keyboard which looks bizarre, and the straight layout and limited keys makes it much harder to type. A total gimmick.

Keyboards Should Have Been Like This

The Future

Remember when we stopped using keyboards? Basically just examples of zany ideas which aren’t that practical for most people.

2002: COMPUTER KEYBOARDs of the FUTURE | Tomorrow’s World | Retro Tech | BBC Archive

Code Comments

When writing code, some developers like to write notes to themselves/others in the actual code using “comments”. The intent is for documentation; to explain what the code is doing. It is often argued that code should be “self-describing” which is an idea I agree with. However, there can be complex functionality because it is technical, the developer struggled to come up with a simple version, or maybe the domain logic is just that way.

I’ve collated various examples, with various degrees of humour.

Confusing

Comments are supposed to add clarity to the code. When you see code that looks simple, but the comment seems to describe something else, is ambiguous, or misleading, then the comment has decreased clarity.

// Display save dialog
RefreshFilters();
...
// Save over existing filter
RefreshFilters(); 

You would think RefreshFilters would simply reload the filters. It sounds like it is prompting the user with a save dialog, and even overwrites the existing filter.

Self-deprecating

//Well I've messed this up!!!
//Ultimately a filter ends up adding something to the where clause,
//  or if it's a relationships filter, then the AND clause of the join.
//The way I'm adding it to these is not clean and just sucks.  (two constructors with lots
//  of private variables that aren't needed half the time...yuk.
 
//Reading the above comment ages later, I'm not sure why relationships have to be on the
//join clause?  If it's a left one then yes but you can have a left join to a logical table.
//TODO: I've messed up
/// Ok.  I've messed up.  I've messed up good and proper.
/// Aggregate reports wants to format a date as a year, say.  The formatting stuff
/// is buried in compare functions (because ranges was the only place that I needed to do this)
/// There is the column.AddOutputFormatInformation that might be useful when this is fixed????
// This looks like a hack, and, frankly, it is. I'm sorry, I can't work out how to make this not stupid.
// If you're reading this and have some brilliant insight into how to make this work in a way that doesn't
// make people sad, please go nuts and fix it.

Criticising Others

(cg) => { return cg.DisplayName; })); // Wow this is awful
matchingSlotEntity.JobCategoryName = string.Empty; // This doesn't make sense. JobCategory belongs to a person, not a slot.
/// <summary>This is awful.  Get Ryan to support Guids for Mail Merge</summary>
//Please. Remove word interop. Not even Microsoft want us to use it.
TaskCount TaskCount { get; set; } // TODO: Again, this should be readonly. Something mucks with it, though.
MessageBox.Show("Sorry this feature hasn't been implemented yet... :(", "Sad Info..!!");

Funny

// Reverse the order as they are stored arse about face
this.ribbonBars.Reverse();
// Show yourself!
/// <summary>
/// Builds trees for a living.
/// </summary>
internal static class MailMergeTreeBuilder

this.AllowDrop = false;  // No dropping on this tree thankyou.

Even Microsoft throw in the occasional gem.

// At this point, there isn't much we can do.  There's a
// small chance the following line will allow the rest of
// the program to run, but don't get your hopes up.

Useful Warnings

This one warns you that the logic is confusing.

/**********************************************************************************
Description
-----------
You'd think that a stored procedure called 'GetNextBusinessDay' would return the
next business day for an organisation. That would be too easy.
 
What this stored procedure does, is return a day that is not explicitly marked
as 'closed'.
 
It also doesn't return the 'next' business day in the default case where the parameter @BusinessDayOccurrence is set to zero - in that case it returns the current day, or the next non-closed day if the current day has a closure defined on it.
 
@BusinessDayOccurrence doesn't find the first non-closed day after X days from today, incidentally. It returns the Xth non-closed day, which is an important difference. If you have closed bank holidays, and want to know the 2nd non-closed day after 24th December, it's not 27th December but 28th December Confusing!
 
**********************************************************************************/

Logging too many Errors

When things go wrong with your software, it’s obviously good practice for the developer to log relevant information into an error log. You can then know when users are affected, but also how many are affected – to understand how widespread the issue is. With that information, it becomes easier to triage. There can be loads of other projects to work on, and bugs to fix, so being able to prioritise them is key.

Choosing what to log and how often can require some thought. You can come up with categories to your logging such as Information, Warning, and Errors. Errors are when things have gone wrong, Warning could be that you suspect something has gone wrong like missing config, and Information could be useful for debugging like if there is an optional service the user connects to, you can log “user connected“, “user disconnected“.

We have a chat functionality which uses a PubSub (publish/subscribe) model, and we were logging status changes and connection statuses. If you just blindly log scenarios like this, then it might be counterproductive. If the statuses are changing frequently, and there are thousands of users, you can be spamming the error log and then it makes it harder to see the real problems. If you see the same entries logged again and again, it becomes likely that you just think “we expect that“, and then just ignore it.

There can be extra costs associated with logging too. Data takes some memory to store and adding thousands of rows to a database per day can quickly increase the size. All those extra network calls can be excessive too.

We have had a few projects recently with the aim of trying to cut down the amount of errors.

In the case of problems, then obviously fixing the root cause of the problem is the best strategy. If the logs aren’t useful, then it’s best to stop logging them.

If the logs are useful, sometimes it’s best to cut down the logs rather than stop completely. So if you have a log such as “failed to connect” then it retries in a few seconds, do you really want to log another “failed to connect“? Maybe the functionality should try 5 times then give up until the user manually attempts to reconnect. Maybe the logs could remain on the user’s computer then submitted once with the number of failure attempts. So instead of 5 separate entries, it could just submit 1 saying it tried 5 times then gave up.

On a large scale system like ours, the number of entries in the databases are crazy. Read this statement from a concerned Support team member (which I think were the stats 1 month after a recent release):

Based on the daily volume of errors logged over the past few days I’m expecting the number of errors logged in Monitoring to increase by 82% over the course of a month.

  • Current Stats from Server11:
  •  Total Rows in ErrorLog: 11,519,198
  •  PubSub Errors: 3,283,396
  •  Errors excluding PubSub Errors 8,235,802
  •  Avg PubSubErrors/day 218,893
  •  Estimated PubSub Errors 31 Days 6,785,685
  •  Data Size per error (bytes) 3,015
  •  Estimated Data Size PubSub Errors (MB) 21,072

 

Null checking

Introduction – The Basics

A NullReferenceException is an incredibly common mistake and probably the first problem new developers encounter. If you have a reference to an object, but the object is null, you cannot call instance methods on it without it throwing an exception.

So in C#, you could have 

Dog myDog = null;
myDog.Bark();

Which will throw an exception because myDog isn’t initialised.

Dog myDog = new Dog();
myDog.Bark();

This is fine because a dog has been initialised and holds a reference to an object.

If you allow the possibility of nulls, then whenever you want to call a method, you end up checking for null.

if (myDog !=null)
   myDog.Bark();

More concise syntax involves using a question mark which will conditionally execute if it is not null such as:

myDog?.Bark()

So the question mark acts as the IF statement but is a more concise way of expressing it.

New “Nullable Reference Types & Improved Design

A cleaner, and safer design, if you can, is to never allow nulls, then you never have to check for them. In the latest versions of C#, you can make it so the compiler will assume things shouldn’t be null, unless you explicitly specify they can be; see Introducing Nullable Reference Types in C# – .NET Blog (microsoft.com).

Our Problem

I work on some software which is over 10 years old using an older version of .Net. So things can be null, and are often designed for null. We noticed that newer developers (seem to be common for our Indian developers for some reason), seem to put null checks everywhere, regardless if the reference can be null or not. This makes the code much harder to read, and possibly debug, because you are writing misleading code, and adding extra code branches that would never execute.

In a recent code review, the developer added an if statement to check for null

if (user == null)
   return;

So if it got past this code, user cannot be null, yet for every single statement afterwards, he added the question mark to check for null! eg

var preferences = user?.Preferences;

Although it’s also bad design to chain loads of properties together, it is sometimes necessary. Trying to be concise and adding the questionmark to check for nulls can be hard to understand what is actually executed. Combined with LINQs FirstOrDefault, you get even less clarity. FirstOrDefault will return the first item in a list, and if there are no matches; it will return null. So when you see all the conditionals and a method like FirstOrDefault, then when you glance at the statement, it looks very likely it will return null.

var result = value?.Details?.FirstOrDefault() is Observation

So “result” is null if: value is null; details is null; or details contains no items.

“everything is null these days, or not sure what type it is”

Me

A longer example is the following:

if (recordData != null && (_templateDataSource?.EventLinks?.ContainsKey(recordData) ?? false))

Due to the null checks, they have added the “null coalescing operator” (??) and set to false when null. However, none of this data should have been null in their code, so it should be:

if (_templateDataSource.EventLinks.ContainsKey(recordData))

The Lead Developer made a good point that if this data was null, it would be a major bug, but if you put the null checks, then the logic gets skipped and a bug is hidden. It would be preferable to actually crash so you know something is very wrong.

Lead Developer

Should this ever be null? Or the EventLinks on it?
Remember, all these null safety checks could allow the software to continue on invalid data conditions and produce wrong answers instead of crash.
Please re-check ALL the ones you have added in these changes, and if something should never be null remove it.
Unless you are planning to test the behaviour of this when null is passed in and that it is correct.

When so many objects can actually be null, it is easy to miss a check. There was one code review where I questioned the original design which seemed to have excessive null checks (which was implemented by the same author of this new change). His new change was to add a null check that was missed. After making some dramatic changes based on my feedback, he ironically removed his new null check, and thus reintroduced the issue he was attempting to fix!

Developer: "have added a null check to prevent the issue."
Me: "then it will crash because you haven't added null checks"

Another good example of having excessive null checks, but somehow still missing them:

context.NotGiven = resource.NotGiven;
context.Date = Helper.GetDateTimeFromString(resource.Date);
context.IdentifierSystem = resource.Identifier?.FirstOrDefault().System;
context.IdentifierValue = resource.Identifier?.FirstOrDefault().Value;
context.Status = resource.Status;
Coding productCode = resource.productCode?.Coding?.FirstOrDefault(x => x.System == Helper.SystematicInfoUrl);
context.productCode = new FilerCommon.Coding(productCode?.System, productCode?.Code, productCode?.Display);
Coding productProcedureCode = (resource.GetExtension(Helper.ProductProcedureUrl)?.Value as CodeableConcept)?.Coding?.FirstOrDefault(x => x.System == Helper.SystematicInfoUrl);
context.productProcedureCode = new FilerCommon.Coding(productProcedureCode?.System, productProcedureCode?.Code, productProcedureCode?.Display);
Coding site = resource.Site?.Coding?.FirstOrDefault(x => x.System == Helper.SystematicInfoUrl);
context.Site = new FilerCommon.Coding(site?.System, site?.Code, site?.Display);
Coding route = resource.Route?.Coding?.FirstOrDefault(x => x.System == Helper.SystematicInfoUrl);
context.Route = new FilerCommon.Coding(route?.System, route?.Code, route?.Display);
Coding explanation = (context.NotGiven.HasValue && context.NotGiven.Value ? resource.Explanation?.ReasonNotGiven : resource.Explanation?.Reason)?.FirstOrDefault()?.Coding?.FirstOrDefault(x => x.System == Helper.SystematicInfoUrl);
context.Explanation = new FilerCommon.Coding(explanation?.System, explanation?.Code, explanation?.Display);
context.Quantity = new FilerCommon.Quantity(resource.Quantity?.Code, resource.Quantity?.Value, resource.Quantity?.Unit);            
context.LotNumber = resource.LotNumber;
context.Manufacturer = GetManufacturerName(context, resource);
context.OrganisationDetail = GetOrganizationDetails(context, resource);

Some of the FirstOrDefault() then call a property without a null check:

FirstOrDefault().System

Also, because of the null checks when they are setting “site”, it looks like it can be null. So when they set “context.Site”, they have null checks in the constructor for Coding. However that means they are passing null into the constructor which means it probably is an invalid object.

Null flavours

I thought the idea of “Null flavor” is very interesting. NullFlavor – FHIR v4.0.1 (hl7.org) Null means “nothing” but there’s different meanings within; or the reason why it is missing. Some examples are:

  • missing, omitted, incomplete, improper
  • Invalid
  • not been provided by the sender due to security, privacy or other reasons
  • Unknown
  • not applicable (e.g., last menstrual period for a male)
  • not available at this time but it is expected that it will be available later.
  • not available at this time (with no expectation regarding whether it will or will not be available in the future).
  • The content is greater than zero, but too small to be quantified.

Closing Words

Null checks are one of the simplest concepts in object oriented programming, so it’s bizarre how we are seeing many modern programmers struggling to understand when to use them. Even when they find a bug in their code, they still fail to learn their lesson and continue to write bad code with inappropriate null checks. A better design is to avoid the possibility of nulls by always dealing with an object reference (even if the object is one that simply represents “null”).

Performance Tales: SQL performance

Batches

Performance of SQL database queries is an interesting topic that I never understand. When dealing with a large amount of data, we are often told it is beneficial to process the results in batches. So if you are wanting to update/delete rows based on some calculations, for example; we may be told to iterate through groups of 1000 rows.

You can’t just put your query in a batch then assume it is efficient. There was one attempted data fix where the reviewing Database Expert reckoned it would take 2h 46 mins to execute.

“It doesn’t matter if you select 1 row or 1m rows – it still takes 10 seconds. So 1000 batches x 10 sec = 2h 46min processing time just working out what to delete. We need to change the proc so that it gets all 1m rows in one hit at the start into a temp table (should take 10 seconds) and then use that list to work from when batch deleting the rows”

Database Expert

Performance Tales: LINQ to SQL
  

Although we mainly use SQL Stored Procedures to retrieve data from the database, some of our code uses LinqToSQL which uses code to dynamically generate SQL. Sometimes it seems that handcrafting your own database query (Stored Procedure) can give better and consistent results since the SQL is the same every time. Sometimes the performance issues isn’t with the actual query, but possibly that the columns don’t have an appropriate Index applied to them to allow fast lookups on that column.

There was a problem where our default 30 second timeout was been reached for some users. So if they attempted to load a large record, it would take 30 seconds then error out. One developer suggested to increase from 30 seconds to 5 minutes.

“i’d like the customers to still be able to see the user’s record.”

Developer

The record would eventually load for the user (presumably), if the user has actually waited for 5 minutes and not closed the application, thinking it has crashed (often the user interface in this example would just show “Not Responding“).

This idea doesn’t fix the root problem though, and the software still seems unusable for the user’s point of view.

An Architect stated:

“there is something very wrong with the query if it’s taking longer than 30 seconds, particularly when the use case of using LINQ2SQL is commonly for small batches of data.

Software Architect

The default is usually a good indicator that something is wrong, but if we increase the timeout across the board we will miss out on such problems.

The reason the timeout is being breached needs to be investigated:

  • Is it general: applying to any query seemingly at random?
  • Is it during the connection to SQL Server or during the execution of a query?
  • Could a particular problem query be optimised, be made asynchronous, or the timeout be altered for that individual query?

The Database Expert stated:

Nothing by default should take over 30 seconds – it’s a good default. If we have particular problems with SQL taking over 30s it should be investigated and addressed. It is possible that some features should be allowed to take over 30s (e.g. not user facing, known to take a long time). Allowing >30s means more chance for blocking and wider impacts.

Having queries run longer than 30s increases the amount of time they are running in SQL – this could lead to more blocking, more CPU/memory demand which could make the entire server unreliable, so we go from 1 user complaining to many.

LINQtoSQL can be optimised, we’ve done it many times over the years. The simplest is to replace the queries with Stored procedures – but it is possible to improve bad LINQtoSQL too. It depends what is wrong. None of daily operations should be anywhere near that. 

SQL can do near 0ms operations on many more rows of data than we have. It isn’t a problem because we have more data, it will be something with that query. Poor plan choice or Blocking.

Database Expert

Performance Tales: Limit User Input

A developer looked into an issue related to our Searches module which allows the user to query their data and generate tables and charts to create their own reports. The Developer claimed their new changes gave performance “up to 10 times better

The Database Expert looked at his changes and stated:

“I think that this might not be the right way to approach this issue in isolation. The issue with this procedure is not that it is slow in general, it is that our software allows you to call it in a huge variety of ways, some of which can only produce terrible execution plans as it stands. We should be analysing what kinds of searches that users do, and see if we can restrict some of the sillier searches that they can do. For example, users can right now do an audit search filtered on “anything before today’s date” with no further filters.

For example, see this snippet of a live trace done right now: <shows screenshot>

Clearly, some calls are very well optimised and others are terrible.

We should be looking at restricting what users can do, for example requiring people to search on a more focussed timeframe.

The way I would approach this issue would be to look at trace data and look for two main things:

-The styles of queries that are run most often

-The styles of queries that run the worst

Both need focus. The ones that run most often, even if they are relatively quick, can yield a big improvement simply from scale alone. The ones that run the worst improve the overall user experience as well as the impact on the system.

Of course improving a query might not involve any SQL changes at all, instead they might involve app changes to prevent silly queries from being run.”

Database Expert

Keep lowering the number with each MI

There was a Major Incident regarding a feature. The stored procedure used already had a limit on the amount of data selected in one go. A developer changed the number from 50 to 10. Since it is a very small number, I couldn’t understand why such small numbers made a difference. There was a code comment next to this limit to say it was there to “improve performance”. I looked at the file history to see if I could find any more information about why this line was added in the first place. I saw a very interesting trend, where that line was the only line to be changed in the previous 2 revisions:

original: select top 250

24 Sept 2015: select top 100

2 Oct 2015: select top 50 

25 May 2022: select top 10 

The developer did explain that the data is passed to another process that takes 90 seconds to process 10 instances. It’s essentially a queue that is polled every 10 minutes and would only have a small number of tasks each time.

The concerning thing is that the number keeps being lowered and the performance is not deemed good enough. Maybe the overall implementation needs revising.

Infinite Scroll

An Automated Tester posted the following

“I hope you want to test infinite scroll by scrolling down to the last element. You can use mockIsIntersecting(dom.getByTestId(“datatestId”), true) from library react-intersection-observer/test-utils to test this.”

This made me think deeply:

Is it using infinite scroll if you can scroll to the last element? 🤔

Me

Maybe in most situations, an “infinite scroll” feature isn’t truly infinite unless it is dynamically generating content. But in some cases, it’s near infinite, like in Twitter where your feed could just go on for seemingly forever because there is far more content than a human can read.

However, in that case, you could never do a test on real data that scrolls to the bottom. If you had mock data, then you could test that scrolling with limited data does reach the bottom.

I told a colleague about this and got him thinking about other claims of infinite. He said it’s like the grains of sand idea. There’s a massive amount but it’s not really infinite. Or the size of the universe; constantly expanding, but at any given time; is finite.

Mike: A line on a graph that extends out to infinity, it doesn’t though; it extends out as far as you graph.

Me: Buzz Lightyear can go beyond infinity (“To infinity, and beyond!”).