Whether to log a new bug

Recently, we had the classic debate whether when discovering an (unrelated) problem, a bug fix:

should be rejected/marked as failed,
or if you pass it, but log a new bug.

Sam said he had never logged a new bug if testing failed, even if this issue wasn’t called out in the bug report; in the description or recreation steps.

So to explain using this situation: I had fixed the Entity Framework code which saved a new row to the database table. The bug was for passing correct values into it, and my change was fine; the correct values were now saved in the appropriate columns in the new row. However, if you send multiple calls at once, Sam noticed it wasn’t incrementing the number by 2 as expected, only by 1 (the first call was then essentially being overwritten – a classic concurrency problem).

I think it’s reasonable to use the original Work Item (Bug Report) to flag any issues testers find, but once we confirm the new issue isn’t caused by my change it should go into its own Work Item. You might assess that the new bug could be fixed in a later release. It makes no sense to fail bug A because you found bug B.

Sam is basically increasing scope, then moaning that we might fail the sprint, because we cannot complete the new item within the 2 week deadline.

Complex Processes lower morale and encourage bad practice

I always find it interesting when people work in a particular job then get promoted into management. It’s a completely different set of skills and if it’s a fair promotion, the idea of getting so good at your job, that you no longer do that job anymore; is another illogical aspect of it.

One thing that always amazes me is when people make decisions that they know are a bad idea from their experience doing the job.

When I worked as a software tester, my view is that we were essentially there to find any bugs that exist. Part of finding them is to document how to recreate the bug so that developers could fix it. Extending this process so it’s more complex, more stages, or involves more people – causes people to not want to find bugs.

There were times where I witnessed people do the bare minimum and they would ignore bugs that didn’t appear severe to them.

One of the worst people I’ve worked with was an average tester who wanted to become a Test Manager, and he ended up trying to make the process more complex and often announced changes in a condescending way.

When testers found a bug and wanted to investigate it, they would often try to recreate it, sometimes under different scenarios to work out the scope and impact of the bug, then will tell a developer their findings and only then get it logged.

Therefore there was a delay between finding the bug and actually logging it. So we got an email from the Test Manager like so:

All,It is important that as soon as you discover a defect, you raise a defect for this BEFORE you speak with the developer. Any defects raised can easily be closed if they have been raised in error or discovered by the developer to not be an issue. We run the risk of releasing with defects that could potentially cause serious issues for our customers.

I understand his point that – if managers are checking the system to see what bugs are outstanding and they don’t see them all, then potentially, the software could end up being released with bugs. However, the process started getting worse from then on:

Please can you include myself and Becky on any emails that are discussing a defect with a developer. This is so that we are both kept updated with any defects that could cause issues. Also for every defect you raise, I’d like an email to myself and Becky with the follow information :
--        WorkItem ID
-          Title
-          Area
-          Any other information you feel relevant.

So now when we discover a bug, we had to log it straight away without the investigation, email two Test Managers, then copy in any further emails to them. Then as more information is known, update the bug report, and making sure we also had an appropriate workaround if the bug did get released (or is already released).

All,When you are filling out the SLA tab for a defect you need to ensure that if you’ve specified that there is a workaround available that the Workaround box is filled in with the Workaround.
 
If you’ve raised any defect that is a Severity 3 this MUST be fixed before the branch is signed off. This is our exit criteria, we do not sign a release off with any Sev 1, 2s or 3s. if the developer disagrees with this, escalate it to myself and Becky and we’ll deal with it.

Often when we logged a bug, he was either emailing you or comes to your desk to ask why you haven’t triaged it with a developer yet. Sometimes he did that within 10 minutes of you logging it. So he wanted you to log it before triaging, but would then demand that you triage it even if you haven’t had chance to contact an appropriate developer.

You’d also have other test cases to run which he was always on your back to give him constant status reports. It was hard to win because if you have tests to run and have found bugs, then he will want you to triage them but sometimes helping the developer could take hours which means you aren’t testing, so he will be asking why you haven’t run your tests.

That level of micromanaging and demanding updates wasn’t great for morale and also encouraged Software Testers to stop logging the bugs they found because it just added to their own workload and stress.

It seemed better just to steadily get through the tests, but I suppose if you didn’t want to log bugs, then what was the point in actually running the tests? I did suspect some people just marked them as passed and hoped there wasn’t an obvious bug they missed.

How Not To Write Unit Tests

Introduction

Unit testing is a software testing technique where individual components of a program are tested in isolation. These components or “units” can be functions, methods, or classes.

When implemented correctly, Unit testing is a crucial practice in modern software development. It helps ensure the quality, reliability, and maintainability of your code. By incorporating unit testing into your development workflow, you can catch bugs early, improve code quality, and ultimately deliver better software to your users.

When I heard about Unit tests, they did seem awesome. But then the more I used them, I found that my opinion on them has declined. I find it quite hard to explain though.

I think in general, to make things testable, you have to split the logic up into smaller methods. But then when they are smaller, A) they are easier to understand and B) they are less likely to change. So if a developer has looked at that code, what is the chance they are gonna change it and break it? If you have a unit test and it never fails in the software’s lifetime, has it provided any benefit?

Then in the case that you decide to change the behaviour, then you have the overhead of rewriting all the unit tests and it can basically double the development time.

When there’s certain scenarios which could end up taking ages to manually test it, the unit tests are very beneficial. When there’s loads of permutation/optional aspects to logic, it is a prime candidate for unit tests. Without unit tests, retesting every time you make a simple change is incredibly tedious. But with unit tests, you just click a button and wait a few seconds.

Unit tests give you confidence you can refactor without risk. However, they are not automatically the silver bullet. Well-written, fast, reliable tests accelerate development and save time. Poorly-written, slow, flakey tests hinder development and waste time.

https://twitter.com/olafurw/status/1578704185809244160?s=20&t=oD3MJz9EXH-cHX7fFKOGdg

Fast Tests

A test that takes a second to run doesn’t sound slow, but what if you have hundreds or thousands of tests? If the tests take a long time to run, the developers won’t run them as often, or at all, then what value do they serve?

They also should run on a build to ensure only quality releases actually go live, but you want your release process to be fast.

There was a recent change where the developer was claiming to have sped up a long-running call, however, he hadn’t carried over that performance enhancement mindset to the tests, and had actually increased the time to run them by 6 seconds.

The code “Thread.Sleep” can be used in threaded code to intentionally call a delay. I’ve seen many developers add this to a unit test. Tests are supposed to be fast, so you should never add this in a unit test.

Measuring Tests & ExcludeFromCodeCoverage

When people write unit tests, they want to try to understand how much of their code is covered by tests. We have this metric of Code Coverage but it has some severe limitations in the way that it is measured. It’s often a simple metric of “does the line get hit by at least one test”, but since methods can be executed with different combinations of variables, you can end up having 100% statement coverage but without actually testing many combinations at all.

The metric is one that impresses managers so you often see developers writing bad tests simply to game the test coverage metric. This is bad as you end up being misled that your code changes haven’t caused any bugs but yet it could have introduced something severe because the unit tests weren’t adequate.

I’ve seen quite a few code changes purely to increase the code coverage. So the title of the change would be like:

“Added more code coverage”

Then when I check the build output:

“There might be failed tests”

How can you be adding more tests then not actually run them before submitting it to review? Madness. The explanation is that their focus is just on coverage and not on quality. Maybe a bit of arrogance and laziness.

https://twitter.com/housecor/status/1454413464156483587?s=20

This week I worked with a team to get code coverage over 80% (a corporate minimum). The problem with this effort: Code coverage can be gamed. Sure, low code coverage means there’s a lot of untested code. But, high code coverage doesn’t mean the code is well tested.
Cory House

You can add ExcludeFromCodeCoverage “attributes” to your code which tells the code coverage tool to ignore it. It’s a simple way of reducing the amount of lines that are flagged as untested.

Here’s one of our principal developer’s opinion on this attribute:

“Using ExcludeFromCodeCoverage is only good when the goal is 100% coverage. That should never be the goal. The goal should be a test suite that prevents bugs from ever going live. I’m happy never using it and just having coverage reports flag things that are not actually covered, it is a more realistic representation of what the tests cover and makes me much more cautious about changing them as I know I don’t have test coverage. Never add Exclude from Code Coverage, it’s just lying to everyone. Why suppress things that might be a problem, if they are, we need to fix them.”
Principal Developer

Personally, I think adding suppressions/attributes just clutters the code base. I’d rather just treat the stats as relative to each release. The numbers have gone up/down, but why? If we can justify them, then it’s all good. Chasing 0 code smells and a specific test coverage means you can just cheat and add the likes of ExcludeFromCodeCoverage to meet such metrics.

Another developer said:

I value a holistic set of metrics that help us understand quality in software development. Code coverage is a single metric that can be part of that set of metrics you monitor. No single metric can stand by itself, and be meaningful. Nothing is perfect, which is why we should value a toolbox. I don’t believe in gaming the system and “hiding” uncovered code to get to 100%.

You need engineering teams who are prepared and confident enough to publicly share their coverage reports. This sets the tone of the culture. Context is needed, always. There will be reasons why the coverage is as it is. Use tools that help engineering teams with confidence/delivering at pace and ultimately delivering customer satisfaction. You cannot compare reports from different teams or projects.

Useful Tests

You need to make sure your tests actually test some logic. Sometimes people end up seemingly writing tests that really test the actual programming language, but I suspect it is just so the Code Coverage metric is fooled. Code Coverage checks if lines of code are “covered” by tests, but the simplistic nature of the check just ensures that a line of code is executed whilst a test is running; rather than if there was a meaningful test.

So for example:

[Fact]
public void DefaultConstructor_ReturnsInstance()
{
        var redisMgr = new RedisStateManager();
        Assert.NotNull(redisMgr);
}

So there you are instantiating an object then checking it is not null. Now that’s how objects work in C#. You instantiate an object, and then you have an object. Now, I suppose an exception could be thrown and the object wasn’t created, but that is generally considered bad practice and also there was no other test to check a situation like that, so they haven’t tested all scenarios.

largeResourceDefinitionHandler.MissingResource = _missingResourceDefinitions;
 
Assert.NotEmpty(largeResourceDefinitionHandler.MissingResource);

Setting it then checking it is set. Unless the property has loads of logic which you could say is bad design, then checking it is set is really testing the “.net framework” but if you think you need this; that means you don’t trust the fundamental features of the programming language you are using. You are supposed to be testing the logic of your code, and not the programming language.

If there’s lots of setup then the Assert is just checking for Null, then it’s likely just to fudge the code coverage. Another classic that I’ve seen is loads of setup, then ends with:

Assert.IsTrue(true);

So as long as the test didn’t throw an exception along the way, then it would just always pass because True is definitely equal to True.

Those ones seem intentionally malicious to me, but maybe the following example is more of a case of a clear typo:

Assert.Same(returnTrigger, returnTrigger);

Whereas this following one looks like a typo, but it’s actually two different variables. Need to look closely (one is a single S in Transmission). 🧐

Assert.Equal(organisationTransmissionStatus.Enabled, organisationTransmisionStatus.Enabled);

What goes through people’s heads? How can you write code like that and carry on like nothing is weird.

Sometimes tests look a bit more complicated but on analysis they still don’t really test much:

        [CollectionDefinition(nameof(LoggerProviderTests), DisableParallelization = true)]
         public class LoggerProviderTests : IDisposable
         {
[Theory]
                 [InlineData("Verbose")]
                 [InlineData("Debug")]
                 [InlineData("Fatal")]
                 [InlineData("Information")]
                 [InlineData("InvalidLogLevel")] // Test with an invalid log level
                 public void GetMinimumLevel_ReturnsCorrectLogLevel(string logLevelSetting)
                 {
                         // Arrange
                         System.Configuration.ConfigurationManager.AppSettings["DistributedState.LogLevel"] = logLevelSetting;
var firstInstance = LoggerProvider.Instance;
                         var secondInstance = LoggerProvider.Instance;
// Assert
                         Assert.Same(firstInstance, secondInstance);
                 }
         }

So this sets a setting on AppSettings, presumably used by the “LoggerProvider”. However, all they are doing is testing that if you call the Instance property twice, it returns the same object both times. So the setting of the different log levels is completely irrelevant. I mean, the log level could be completely wrong but you are comparing ‘is the wrong value of A the same as the wrong value of B’; and it will still pass.

Another common aspect is when you use a testing library like Moq, and you can use it to create objects and essentially say “when I call some code with these specific parameters, then give me this value back”. The thing is when developers use this as the actual thing they are testing, then you are testing Moq, and not your actual logic.

	[Fact]
	public void JobReposReturnsValidJobTest()
	{
		//Arrange
		ScheduledJob job = new ScheduledJob() { ScheduledJobId = Guid.Parse("e5ee8f7410dc405fb2169ae2ff086310"), OrganisationGUID = Guid.Parse("fbee8f7410dc405fb2169ae2ff086310") };
		_mockJobsRepo.Object.Add(job);
		_mockJobsRepo.Setup(e => e.GetById(Guid.Parse("e5ee8f7410dc405fb2169ae2ff086310"))).Returns(job);

	//Act
	var resultJob = _unitOfWork.JobsRepo.GetById(Guid.Parse("e5ee8f7410dc405fb2169ae2ff086310"));

	//Assert
	Assert.Equal( Guid.Parse("fbee8f7410dc405fb2169ae2ff086310"), resultJob.OrganisationGUID);
	}

“I think all this test is doing – is testing that JobsRepo returns an object that was passed into the constructor on line 22. The GetById is redundant, it will always work if it returns that object because the Moq was configured to return that value. That is testing Moq, and not our code. But then if you are just asserting a property returns an object, you are just testing that C# properties work.”
Me

“yes you are right , I am just testing if JobsRepo could return a value, so that it helps me in code coverage for get functionality of JobsRepo , as it is just set in the constructor of the class and there is no call for get”
Developer who wrote bad tests

So I think they are saying “I am just fudging the coverage”. Checks it in anyway.

There’s been loads of tests where you could actually cut out large parts of the method they are testing and the tests still pass. Again, sometimes you point this out to developers and they still want to check it in, purely for the statistics, and not for any benefit to any developer.

“do these tests add value? a quick glance suggests this is very dependent on your mock object. It might be the case that the production code can be changed without breaking these tests.”
Me

yeah, they’re kind of meaningless. Merging code to use as templates for future, better, tests.
Developer who wrote bad tests

Here is a rant I left on a similar review:

This name implies that it should always be disabled, especially because there’s no coverage for the case where it is true. However, these tests aren’t really testing anything. I think you’re basically testing that Moq works and the default boolean is false. I think the most you can really do is call Verify on Moq to ensure the correct parameters are passed into the GetBool call.

If you replace the contents of IsRequestingFeatureEnabledForOrganisation with return false, your tests pass which illustrate the coverage isn’t complete, or you aren’t guaranteeing the configuration manager code is even called at all. Personally, I don’t think it is worth testing at all though. All your class does is call the FeatureDetails class so you aren’t testing any logic on your side.

I think people are too concerned about getting code coverage up, so they insist on writing tests, even if it makes things more confusing.

I suppose it is up to you and your team to decide what you want to do, but I occasionally question people just to make them think if it is actually adding any value. I’ve seen tests where they simply assert if an object is not null, but it could literally return an object with all the wrong values and still pass (and the method always returned an object anyway so could never fail). If you see a method has tests, it gives you a false sense of security that you think it is going to catch any mistake you make, but it just always passes anyway

always think if your tests will add value and if it’s worth adding them. If you need to mock everything then they’re not very valuable, or you’re testing at the wrong level (too high), and you’re better off with integration tests than unit test. 100% code coverage is a really bad idea for complex software, massive diminishing returns in value the higher you try to push it. We change stuff all the time in our software too, so if everything has high-level unit tests then you spend more time fixing those tests.I tend to find you spend ages writing tests then if you change the implementation then you have to change the tests and you can’t run them to see if you broke anything because you had to change the test to run it.
Me

Test Driven Design (TDD)

There’s a methodology called Test Driven Development where you write a test first. It will then fail if you run it because there’s no functionality to run. Then you write the implementation and get it to pass. Then move onto writing the next test, and repeat. So you build up your suite of tests and get feedback if your new changes have broken previous logic you wrote.

I was recently listening to a podcast and the guest said that he always writes code first, then adds tests after. If he can’t write tests, then he will make a change to bypass code just for the tests. I wasn’t sure what he meant by this, maybe it’s like when people write a new constructor which is only ever called by the tests. But that’s bad design.

I thought he may as well just do TDD from the start, instead of always going through that struggle. He says TDD doesn’t often lead to good design because you aren’t thinking about design, you just think of how to make the tests pass.

But doesn’t the design organically come from TDD? and his way of changing the design just for the tests is what he is arguing against TDD for. TDD often slightly over-engineers the solution with the likes of Interfaces. So then he is avoiding TDD, and instead writing the tests after; but his way adds “Technical Debt” via adding extra constructors that are only used by the tests.

“I’ll add tests in a separate change later”.

5 reasons to add tests before merge:

1. Clear memory: Before merge, everything is fresh in my mind. I know what the code is supposed to do, because I wrote it. So I also know what tests I should write to assure it works. Every minute that passes after merge, I will understand the feature less, and thus, be less equipped to add proper test coverage.

2. More effective reviews: If I write the tests before merge, then anyone reviewing my code can use my tests to help them understand the code, and to watch the feature run.

3. Faster development: If I write tests during development, I can use the tests to accelerate my development. I can “lean” on my tests as I refactor. Faster feedback loops = faster development.

4. Better design: Writing tests during dev encourages me to write code that is testable. It makes me consider accessibility too since that tends to make automated testing easier by providing well-labeled targets.

5. Changing priorities: After merge, there’s no guarantee that I’ll have time to write the tests at all. I may get pulled away for other more “urgent” tasks.

Bottom line: The proper time to add tests is *before* merge.
Coty House

I recently saw the following conversation. A developer was basically saying he didn’t have time to write the tests, and it might end up in some drastic refactoring which would be risky. Then the plan is to rely on manual testers and get the changes released. Then the next part probably won’t happen (because important features will be prioritised), but his suggestion is that he then makes the changes for the next release with good unit test coverage.

Senior Developer:
This domain supports unit testing, you should be able to add tests to cover the changes you made to make sure it behaves as you expect

Developer
Currently there are no unit test cases available for the changes made class, and the class is tightly coupled. I have written some draft tests and will check them next month as a priority.

Architect
IMO, given the pressure, timescales and urge to complete this, I think we can defer for now and stress the testers to pay more attention to the areas that we have low confidence.

Senior Developer:
So instead of checking if it is correct by adding tests that we can be sure exercise the code changes, we just merge it and hope that the manual testers find any bugs over the next day or so, and if they do, then it is back to the dev team and another change?

Time In Unit Tests

Tests should be deterministic. If a test is run and passes, then if no changes have been made and we run it again, it should also pass (obviously). An unreliable test doesn’t give you confidence in code changes you make. It’s a surprisingly common occurrence when you make a change and an unrelated test breaks, and you are thinking “how can those changes break the test“? then you look at what it is doing, and it’s often something to do with time.

You see something like
data is "BirthDate":"1957-01-15T00:00:00"
And the test result says:
Expected "Age":"67y"
Actual: "Age":"68y"
Today is their birthday!

What you need to do is put a “wrapper” around the code that gets the current date. So instead of simply DateTime.Now, you create a class called something like DateTimeProvider, and in the production code, the class returns DateTime.Now. Then in your Unit Tests, you then create a MockDateTimeProvider and make it return a hard-coded date. That way, no matter when you run the test, it always returns the same date, and is a deterministic test.

I recently fixed some tests that were failing between 9pm-12am. I found that a developer had changed the MockDateTimeProvider to return DateTime.Now, completely rendering it pointless. Other parts of the test were adding 3 hours to the current time, and because 9pm+3 hours is tomorrow’s date, the date comparison it was doing then failed.

public class MockDateTimeProvider : IDateTimeProvider
{
        public DateTime Now { get { return DateTime.Now; } }
}

I think another red flag in unit tests is conditional statements. Logic should be in your production code, and not in tests. Not only does this following code have a DateTime.Now in it, it looks like they have put a conditional If statement in there, so if it would normally fail, it will now execute the other branch instead and pass. So maybe the test can never fail.


[Fact]
public void ExpiryDateTest()
{
        DateTime? expiryDate = (DateTime?)Convert.ToDateTime("12-Dec-2012");
        _manageSpecialNoteViewModel = new ManageSpecialNoteViewModel(_mockApplicationContext.Object);
        _manageSpecialNoteViewModel.ExpiryDate = Convert.ToDateTime(expiryDate);
        
        if (_manageSpecialNoteViewModel.ExpiryDate < DateTime.Now.Date)
                Assert.True(_manageSpecialNoteViewModel.IsValid());
        else
                Assert.False(_manageSpecialNoteViewModel.IsValid());
}

Other Bad Unit Tests

Maybe the most obvious red flag, even to non-programmers – is testing that the feature is broken. The developer has left a code comment to say it looks wrong!

Assert.Equal("0", fileRecordResponse.Outcome); // I would have thought this should have been -1

The One Line Test

How do you even read this. Is that actually one line? 🤔🧐

_scheduledJobsRepo.Setup(r => r.GetAllAsNoTracking(It.IsAny<Expression<Func<ScheduledJob, bool>>>(),
        It.IsAny<Func<IQueryable<ScheduledJob>, IOrderedQueryable<ScheduledJob>>>(),
        It.IsAny<int>(),
        It.IsAny<int>(),
        It.IsAny<Expression<Func<ScheduledJob, object>>>()))
        .Returns((Expression<Func<ScheduledJob, bool>> expression,
        Func<IQueryable<ScheduledJob>, IOrderedQueryable<ScheduledJob>> orderBy,
        int page, int pageSize,
        Expression<Func<ScheduledJob, object>>[] includeProperties)
        =>
        {
                var result = _scheduledJobs.AsQueryable();
                if (expression != null)
                {
                        result = result.Where(expression);
                }
                result = orderBy(result);
                result = result.Skip(page * pageSize).Take(pageSize);
                return result;
        });

When it is that hard to read, I wonder how long it took to write it.

Other Common Mistakes

I think tests can be unclear if you use a Unit Testing library but not understand what features are available. Like instead of using the ExpectedException check, they may come up with some convoluted solution like a try/catch block then flagging the test as passed/failed.

try 
{ 
        Helper.GetInfoArticles(articleName, _httpWebRequest.Object); 
        Assert.IsFalse(true); 
} 
catch 
{ 
        Assert.IsTrue(true); 
}

Naming the tests can be tricky to make it clear what it does and to differentiate it from other tests. The worst is when the name says something completely different, most likely from a “copy and paste” mistake.

I’ve talked about how using DateTime can make tests fail at certain times. You can end up with tests that rely on some shared state, then the order you run the tests causes failure when the test expects data is set or not set.

Bank

Tests are even more important in certain domains. You know, like when money is involved. Commercial Bank Of Ethiopia allowed customers to withdraw more cash than they had.

https://twitter.com/GergelyOrosz/status/1769972958284362056

“I’m really interested how a bank managed to deploy code that didn’t have tests for “can a user withdraw money when they don’t have enough balance.” Development teams at banks are usually conservative, process-heavy and slow-moving with changes exactly to avoid this. Wow”

Conclusion

Unit tests can be a useful and helpful tool to developers. However, there is an art to writing them and they have to be written with good intentions. If they aren’t written to be useful, fast, and reliable, then developers either won’t run them or won’t trust them.

Minimum requirements when logging defects

A colleague wrote a list of ideas to improve bug reports. The more detail a report has, the more stakeholders are aware of the impact and can triage it accordingly. When it comes to fixing it, software developers can work out the issue more easily…

It is important that all the required information is added to a defect at the time of raising it. This stops the need to refer it back to the original author for more information and therefore saves time.

It should also be noted that we report all defects raised to certain user groups, and it is important that they are able to understand what the issue is and how severe the issue is to them.

In order to meet the minimum requirement all bugs should contain the following:

Title

The title must be clear and concise
It should describe the problem
Avoid using technical jargon

Description

The description must clearly outline the problem being experienced
It should explain the impact to the customer
It should describe a workaround if known.

Reproduction steps

Reproduction steps to recreate the issue
These should be clear and easy to follow
Include any required environment configuration
Include the actual and the expected outcome

System info

Enter the version of software the bug was found in

Other checks

Have the Area and Iteration paths been set correctly?
Are relevant examples present with the record? (record number, sample data, screenshots etc.)
Have the relevant error details been included? (stack trace, error message, error number etc.)
Has the bug been linked to the test case/user story etc?
Has this been linked to a relevant user story, requirement, or similar bug report?

Root Cause Analysis (RCA)

If the defect is found during integration testing, the following must also be completed

Stage Detected In
Use Release Testing
Complete Release Introduced In field

Recorded Test Steps

In my blog How To Make Your Team Hate You #3, I wrote about Barbara, a Tester who I used to work with that caused a lot of conflict and was constantly trying to get out of doing work, whilst taking credit for other people’s work.

Recently, when going through old chat logs, I found some brilliant “dirt”, which, in hindsight; I could have probably used to get her sacked because it was fairly strong evidence that – not only was she not doing work; she was falsely passing Test Cases. When you are paid to check if the software is behaving correctly, claiming you have tested it is very negligent.

When running test cases, if you pass each step separately, and haven’t disabled the recording feature, Microsoft Test Manager would record your clicks and add it as evidence to the test run.

I think the feature worked really well for web apps because it can easily grab the name of all the components you clicked, whereas on our desktop app, it mainly just logged when the app had focus and read your keystrokes.

The bad news for Barbara, is that she liked going on the internet for personal use, and liked chatting using instant messenger as we will see.

The Remedy

Type 'Hi Gavin. ' in 'Chat Input. Conversation with Gavin Ford' text box
Type 'Hi Gavin. I've been telling everyone about this concoction and it really worked wonders for everyone that's tried it, myself included. This is for cold, cough and general immunity. 1 cup of milk + 1 tablespoon honey + 1/4 teaspoon of turmeric - bring to a rolling boil. Add grated root ginger (2 teaspoons or 1 tablespoon) and let it boil for another 5 mins. Put thru sieve and discard root ginger bits (or drink it all up if you fancy), but drink it hot before you sleep every night and twice a day if symptoms are really bad. Hope you feel better soon. 🙂 ' in 'Chat Input.

Pumpkins & Tetris

Type 'Indian pumpkin growing{Enter}' in 'Address and search bar' text box
Type '{Left}{Left} {Right} {Left}{Left} {Up}{Up}{Up}{Up}{Up}{Up}{Left}{Left} {Up}{Up}{Up}{Right} {Up}{Up}{Left} {Right}{Right} {Up}{Right}{Left}{Left}{Left}{Left} {Right}{Up}{Left}{Left}' in '(1) Tetris Battle on Facebook - Google Chrome' document

Click 'Amazon.co.uk:Customer Reviews: 100ml Bergamot Pure...' label
Click 'Close' button
Click 'Essential-Oil-Blends_Chart.pdf' label
Click 'Checkout' label

3 days Small Regression Pack


Me 11:26:
Barbara has been doing the Assessment regression pack for 3 days
she says there is only a few left in this morning's standup. There's 15 left out of 27
Dan Woolley 11:28:
lol
Me 11:29:
I don't even think she is testing them either. It looks like she is dicking about then clicking pass
Click 'Inbox (2,249) - [Barbara@gmail.com]Barbara@gmail.com - Gmail' label
Click 'Taurus Horoscope for April 2017 - Page 4 of 4 - Su...' tab
Click 'Chrome Legacy Window' document
Click 'Chrome Legacy Window' document
Click 'Close' button
Click 'Paul' label in the window 'Paul'
Click image
Type 'Morning. ' in 'Chat Input. Conversation with Paul' text box
Type '{Enter}' in 'Chat Input. Conversation with Paul' text box
Step Completed : Repeat steps 6 to 19 using the Context Menu in the List Panel
End testing

Next Day

Me 12:42: Barbara said this morning that all the Assessments test cases need running. She has just removed them instead

Greek Salad

Type 'greek salad{Enter}' in 'Chrome Legacy Window' document
Type 'cous cous salad' in 'Chrome Legacy Window' document
Type 'carrots ' in 'couscous with lemon and coriander - Google Search ...' document
 
Click 'Vegetable Couscous Recipe | Taste of Home' tab
Click 'Woman Traumatized By Chimpanzee Attack Speaks Out ...' tab
 
Marshall 11:50:
oh damn haha
these are things that were inadvertently recorded?
Me 11:51:
yeah
Marshall 11:51:
ha you've stumbled upon a gold mine 
Me 11:53:
I don't think she is actually testing anything. I think she just completes a step now and then
the other day Rob went to PO approve an item and he couldn't see the changes because they hadn't even patched

Haven’t Been Testing From The Start

we are in Sprint 8 and Barbara suggested Matt does a demo on the project so we know how it works; it’s a right riot
Me. 4 months into a project

Bad Audits

I wonder if Barbara was inconsistent with how she ran the test cases, or realised by the end that it tracked you. So near the end of her time, she was just hitting the main Pass button rather than passing each individual step. Managers liked the step-by-step way because if you mark a step as failed, it is clearer what the problem is.

Me 16:15: 
Barbara called me. Matt is monitoring our testing! 
Dan Woolley 16:15: 
how?
Me 16:17: 
looking at the run history
she said he was complaining it wasn't clear which step failed because we were just using the main pass button, and also bugs weren't linked when they had been failed
I told Barbara I linked mine, then she checked and said it was Sam that didn't. I checked and saw it was Sam and Barbara
so only the developer did testing properly 😀
you just can't get the staff

Obviously The Wrong Message

Me 09:12: 
Bug 35824:Legal Basis text needs to be clear 
what's all that about?
Barbara Smith 09:12: 
Charlotte asked me to raise it for visibility 
We need to fix the text that appears on that tab 
Me 09:13: 
what's wrong with it?
Barbara Smith 09:21: 
It says that on the Bug LOL 
And with a screenshot (mm)
Me 09:22: 
it says "needs to be clear" and has a screenshot with a part of it underlined. But it doesn't say what the text should be instead.

She rarely logged bugs because she did minimal testing. Then when she did log something it didn’t have enough info to be useful.

Karma

Barbara got well conned in the end. She was gonna take the entire December off but delayed it for the end of the project and then she has been told she has lost her job, so they are telling her to take the holiday now. She had just bought a house so would be relying on the money for the mortgage payments. Luckily for her she got accepted for a new job, but she was looking for a brand new way of getting out of it, as we will see below.

Tax Fraud

Type 'what if I don't contact hrmc about my tax{Enter}' in 'Address and search bar' text box
Sam 11:23:
Ha ha
You are savage
Me 11:24:
she is gonna get jailed for tax evasion

Maternity Fraud

Click ‘How to get pregnant fast | BabyCenter’ tab

Testing Stories

Go Play, Find Bugs

One of our Senior Testers wrote a blog detailing how she found an obscure bug. When I was a software tester, I often said that – even though you spend a large amount of your time writing Test Cases and running them; the majority of bugs I found were actually performing actions off-script.

The reason for this is that if you have a certain requirement, the developer writes enough code to pass that requirement as it is written. A better developer may even write some kind of automated tests to cover that scenario to prove that it works, and it won’t break in future. Therefore, running a manual test that describes that behaviour won’t find a bug now, and it won’t if you run that test in the future (during regression testing).

Being able to freestyle your steps means you can come up with obscure scenarios and experiment, and do way more testing than you would following a strict, heavily-documented process.

This was the main problem I had working as a Software Tester. Managers wanted the documentation and if you told them you had been testing without it, you sometimes got told to stop, or spend time writing Test Cases for ALL the additional scenarios you came up with. All that does is encourage people to be lazy and do the minimal amount of testing, which consists of just the basic scenarios.

You also get into scenarios where if there is a bug in live, it’s easy to make stupid claims in hindsight. I remember a colleague being absolutely furious with the criticism. They had done loads of testing but there was a bug in live in a very specific scenario:

“I’m disappointed in the level of testing” – Test Manager

Here is our Senior Tester’s blog:

I found a deliciously elusive bug last week. The feeling a tester gets when this happens is joy at your good luck, like satisfaction at solving a fiendish puzzle, and relief at preventing harm. We feel useful!

The bug was to do with online visibility data. My team is implementing the ability to right-click items and set Online Visibility. Sounds simple in theory - but data is complicated and the code base is large.

How was I going to approach this? It was an intimidating piece of work – and I was tired. My normal process would be to come up with some ideas for testing, document them, then interact with the product, make notes, fill out the report. But that day, I just couldn’t face doing the documentation and planning I would normally do before the testing. I decided to just test, not worry too much about documentation, and have fun.

I sought out a Record with a rich data set and played around, selecting things, deselecting them, selecting parent entries, child entries, single entry, multiple entries. I didn’t have any defined agenda in mind except to explore and see what would happen.

One minute in, I was rewarded with a beautiful crash!

I hadn’t taken a note of my steps – but I knew I could probably find the path again. I set up and recorded a Teams meeting with myself, as I didn’t want to have to pause to note down every step I took – that would take a long time and risk my mindset changing to a formal, rigid, structured view – which I didn’t want. I needed that playful freedom. The system crashed again! As there were so many variables at play, I didn’t know what the exact cause was, but I now had proof that it hadn’t been a magical dream.

I spent the rest of the afternoon trying to determine the exact circumstances in vain. I spoke to the programmer, and showed him my recording. He took the issue seriously, and tried to recreate it himself. We both struggled to do so, and decided to wait until the morning.

The following day, we got on a call and went over the recording again. What exactly had I done before the crash? I had selected the parent entry, then two child entries, right clicked but not changed anything, deselected the parent, selected another child, unselected it, selected a different child, selected the parent again and then right clicked and changed the Online Visibility - crash. We tried that again on the developer’s machine, on the same type of report, break points at the ready. Crash! Got it!

The developer eventually narrowed it down to two conditions: child entries had to have a GroupingDisplayOrder index beginning with 1, and the user had to select the parent entry after its child.

It seemed sheer luck that I had found this. But was it luck? No. It was luck by design – I had created a rich data set, and done lots of different actions in different orders, been creative and diverse in my testing. And it had only taken a minute to yield results!

So what did I learn? Reflecting, I noted my preference for highly structured documentation – of tables with colour highlighting, documenting each test in high detail, strictly in order, changing one condition at a time. The result of this was that I tested in a highly formal, structured way to fit the documentation, and only did informal testing as an afterthought. And yet I had most often found bugs during the informal testing!

I had made a god of documentation and lost sight of what mattered most. If you need me, I’ll be testing. And trying not to make too many pivot tables.

What Are Software Testers Really?

The same tester once came out with this quote

“testers are ultimately critics. Developers are artists. Testers are there to give information. What you do with that information is up to you.”

That’s quite an interesting perspective. I think it mainly comes from the idea that Testers can find 10 bugs but maybe you decide that you will only fix 6 of them, a few you might fix later, then 2 you think aren’t a problem, or so unlikely to happen – it’s not worth the effort and risk to fix it.

“we are the headlights of the car, driving into the darkness”

Software Testers In Game Development

“She was the one who taught me the importance of testers and how they are a critical gear in the machinery that makes-up making a game. Testers aren’t just unit tests in human form. They have a unique perspective on the game and poke not only at the bugs but also the design and the thought process of playing a game.”
Ron Gilbert, creator of Monkey Island

Another interesting discussion on the role software testers play is from Mark Darrah who has worked on games like Dragon Age Origins. He does seem to agree with this idea that the Testers are merely critics.

Mark Darrah – Don’t Blame QA

When encountering bugs during gameplay, it’s often misconceived that the quality assurance (QA) team is to blame. However, it’s more likely that the QA team identified and reported the bug, but it remained unresolved due to various factors. For instance, a more critical bug could have emerged from the attempted fix, leading to a strategic decision to tolerate the lesser bug. Additionally, project leaders may assess the bug during triage and conclude that its impact is minimal (affecting a small number of users), opting to proceed with the game’s release.

Such scenarios are more common than one might expect, and they typically occur more frequently than QA overlooking a bug altogether. If a bug did slip through QA, it’s usually not the fault of any single individual. The bug might result from a vast number of possible combinations (a combinatorial explosion) of in-game elements, making it impractical to test every scenario. Your unique combination of in-game items and actions may have simply gone untested, not due to negligence, but due to limited resources.

Complex game designs can introduce bugs that are difficult to detect, such as those that only appear in multiplayer modes. Budget constraints may force QA to simulate multiplayer scenarios solo (a single person playing all four or eight different players at once), significantly reducing the scope of testing.

Furthermore, bugs can be hardware-specific, and while less common now, they do occur. It’s improbable that QA had access to the exact hardware configuration of your high-end gaming setup.

The term ‘Quality Assurance’ (QA) can often be a misnomer within the development industry. While ‘assurance’ suggests a guarantee of quality, the role of QA is not to ensure the absence of issues but to verify the quality by identifying problems. It is the collective responsibility of the development team to address and resolve these issues.

Understanding the semantics is crucial because language shapes perception. The term ‘QA’ may inadvertently set unrealistic expectations of the role’s responsibilities. In many development studios, QA teams are undervalued, sometimes excluded from team meetings, bonuses, and even social events like Christmas parties. Yet, they are expected to shoulder the criticism for any flaws or bugs that remain in the final product, which is both unfair and inappropriate.

Developers, it’s essential to recognize that QA is an integral part of your team. The effectiveness of your QA team can significantly influence the quality of your game. Encourage them to report both qualitative and quantitative bugs, engage with them about the complexities of different systems, and heed their warnings about testing difficulties. Disregarding their insights can lead to overlooked bugs and ultimately, a compromised product.

For those outside the development sphere, it’s important to understand that if you encounter a bug in a game, it’s likely that QA was aware of it, but there may have been extenuating circumstances preventing its resolution. Treat QA with respect; they play a pivotal role in maintaining the game’s integrity.

Remember, a strong QA team is the bulwark against the chaos of a bug-ridden release. Appreciate their efforts, for they are a vital component in the creation of seamless gaming experiences.

Analysing Risk – A Story

Just like my last blog, this is based on an internal blog that our most experienced software tester wrote. She seems to love Michael Bolton, but not the singer. Michael Bolton is also the name of a software tester that is the co-Author of Rapid Software Testing (see About the Authors — Rapid Software Testing (rapid-software-testing.com)).

Michael Bolton

She said that Michael Bolton was asked the following question:

Q: My client wants to do risk analysis for the whole product, they have outlined all modules. I got asked to give input. Do we have a practical example for that? I want to know more about it.
Tester

Michael: Consider the basic risk story –

Some victim will suffer a problem because of a vulnerability in the product (or system) which is triggered by some threat.

Start with any of those keywords, and imagine how it connects with the others.

Who might suffer loss, harm, bad feelings, diminished value, trouble?
How might they suffer?
What kinds of problems might they experience? What Bad Things could happen? What Good Things might fail to happen?
Where are there vulnerabilities or weaknesses or bugs in the product, such that the problem might manifest? What good things are missing?
What combinations of vulnerability plus specific conditions could allow the problem to actually happen?
When might they happen? Why? On what platforms? How?

Our tester stated “This is a brilliant definition of risk. It is also a somewhat intimidating list of questions. If you are looking at this and thinking, “That’s hard!” you’re absolutely right. Good testing is hard. It’s deep, challenging, exhausting. It will make you weep, laugh, sigh from relief. But it’s also tremendous fun.”

Test Automation Mess

Every now and then, there is a big initiative to focus on Automated testing. A manager will decide that our software is complex and too manually intensive to regression test in detail. Automation seems the answer but it’s never that practical.

Our main software, a desktop application, requires interaction through the UI which is incredibly slow and unreliable. We used to have a dedicated Automation team that maintained the tests but they would take several hours to run, would randomly fail, then eventually the team disbanded and declared them obsolete. There’s been times we wanted to replace them with the likes of CodedUI (which turned out to have the same issues), and more recently FlaUI.

When the last “drive for automation” was announced by the CTO, our most experienced tester wrote an internal blog which I thought had a lot of subtext to it, basically saying “it’s a bad idea”.

Communities of Practice around Test Automation

With all of the new Communities of Practice around Test Automation*, I wanted to share some thoughts on whether automation is actually a good idea. This comes from experiences over the years. I hope this saves some people time, and provokes conversations.

To automate or not to automate? That is question….

A common question in a tester’s life: “Should we automate our tests?”

Which of course really means, “Should we write our checks in code?”

This will inevitably give rise to more questions you need to answer:

which checks we should automate
and which we should not automate
and what information running the checks gives us
and how does that information help us assess risks present in the code
and which is the best tool to use
and how often we should run the checks

Asking and answering these questions is testing. We have to ask them because no automation comes for free. You have to write it, maintain it, set up your data, set up and maintain your test environment, and triage failures.

So how do you begin to decide which checks to automate?

Reasons for automating:

The checks are run frequently enough that if you spent a bit of time automating them then you would save time in the long run (high return on investment)
The checks would be relatively easy to write and maintain owing to the product having a scriptable interface (such as a REST API)
They can be performed more reliably by a machine (e.g. complex mathematical calculations)
They can be performed more precisely by a machine
They can be performed faster by a machine
You require use of code in order to detect that a problem exists
You want to learn how to code, or flex your programming muscles(Even if you ultimately decide not to automate your checks, you may decide to use code for other purposes, e.g. to generate test data.)

Reasons against automating:

There isn’t a scriptable interface; the product code can only be accessed via a User Interface (UI automation is notoriously expensive and unreliable).
In order to have a greater chance of finding problems that matter, the check should be carried out by a human being as they will observe things that would matter to a human but not a computer (e.g. flickering on the screen, text that is difficult to read).
The checks would have a short shelf life (low return on investment).

Beware of the fallacy that use of code or tools is a substitute for skilled and experienced human beings. If you gave an amateur cook use of a fancy food processor or set of knives, their cooking still wouldn’t be as good as that of a professional chef, even with the latter using blunt knives and an ancient cooker. Code and tools are ultimately extensions of your testing. If your testing is shallow, your automation will be shallow. If your testing is deep, your automation can be deep.

Ultimately the benefit you derive from writing coded checks has to outweigh the cost, and to automate or not is a decision no one else can make for you.

Testers in my Team

Most of the testers we employ aren’t that technical, and most aren’t interested in writing Automated Tests since that requires knowledge as a developer since it is coding. One of our testers went on a week-long training course about FlaUI. One of the first things he says is “FLAUI is not worth its value”, which made me laugh. The course cannot have painted it and a good light !” 😂

He then got asked to move teams to do pure automation for a few months. Another tester had no interest at all, but was instructed to “try learn”.

“writing the steps is fine, it’s just when you go into the code”
Joanne

There was no way she was gonna be able to learn it. She isn’t technical and the desire isn’t there at all. Being pressured by managers to move away from “manual” testing to “automated” just disrespects them as a tester. It’s happened before and they end up leaving. She eventually moved internally to be a Release Manager.

Automation Mess

The original decision to move to FlaUI was made by a group of Testers and they didn’t get input from the Developers.

I think it would be logical to code using the Coding Standards that us Developers have followed for years. If Developers want/need to help write Automated tests, they can fit right in since the process and code style is the same. Additionally, after years of writing Automated Tests, maybe the Testers want to switch roles and be a Developer and so it would be a smooth transition.

Not only did they invent their own Coding Standards, which meant variables/methods/classes were named differently, there was a lot of duplicated code to perform basic actions like logging in, selecting a customer record etc.

The process including a branching strategy was different too, and so instead of having a Master branch, taking a Project Branch for longer-lived changes, and standard User Branches for simple short-lived branches, they went for a more convoluted strategy where they had Development, Devupdate, Master. Then it became a disorganised mess when work wasn’t merged to the correct branches at the right times.

I can’t even make sense of this:

Before the start of Regression:

1) Lock the Development Branch (no PRs to be allowed to come in to Development till regression is completed)
2) Development, Devupdate, Master are up-to-date by syncing your local with remote branch and get all the commits into local branch
3) Merge from Development to DevUpdate
4) Merge from DevUpdate to MasterUpdate
5) Set <updateTestResults> to true and <testPlanId>(from URL ?planid=12345) inProjectSettings.xml in MasterUpdate
6) Raise a PR from MasterUpdate against Master. Throughout step 3, step 4, observe that ‘commits behind’ are equal after the merge process to that of master.
Once the above process is completed, observe that Master branch is 1 commit ahead of other branches

After the end of Regression:

1) Development, DevUpdate, Master are up-to-date by syncing your local with remote branch and get all the commits into local branch
2) Merge from Master to DevUpdate
3)Change the <testPlanId>toxxxxand<updateTestResults> to false in DevUpdate
4) Raise PR from DevUpdate against Development After Step 2, observe that ‘commits behind’ are equal after the merge process to that of master.
Once the above process is completed, observe that Development branch is 1 commit ahead of other branches

Eventually, a few more technical testers were moved into the team and tasked with aligning the process and codebase with our production code – ie sort the mess out.

This is the classic case of managers thinking they can just assign “resource” to a team, and give them an aim “automate this”; and expect results. But you need the technical know-how, and a clear direction.

Test Specifications

Many years ago, when I was a Software Tester, I remember when we had to write a Test Specification based on the work that the Developers had planned. This was for both Enhancements and Bug Fixes (so new features and changes to old ones).

It would be a Word document, with the Item Number, Title, and then a description of what you would test (the description would be a bit more high level than the step-by-step description featured in an actual Test Case).

You would spend weeks writing it, then you had to get it approved by all the developers, or the Dev Lead. The developer often then told you it’s nothing like you imagined so you had to rewrite it.

Sometimes they would demo the feature to you so you had a better idea. If they had no comments, I often felt that they didn’t read it.

When there was a new Developer who wasn’t familiar with the process, he “rejected” a Test Specification because of some grammatical issues. I think it was something to do with the wrong tense, and not using semicolons. The Tester was fuming, but he was quite a belligerent character.

Evaluation Criteria pic.twitter.com/OIi0Kdk9pl
— Work Chronicles (@_workchronicles) August 27, 2022

I think we often worked from the Bug description, or some comments from the Developer. However, quite often, the comment section would be the Developer writing something generic like “test bug is fixed”, “check data is saved“. If it was more detailed, sometimes you would paste the developer’s notes and change a few words – and have no idea what it meant until you saw the finished thing.

The Verdict

I think both Developers and Testers saw Test Specifications as a waste of time. The Developers weren’t enthused to read it, especially when most people rewrote what the Developer provided, and that might not be the full test coverage needed. The Testers should be adding more value by using their existing knowledge of the system to come up with additional scenarios to cover “regression testing”.

I think the only advantage is to quickly verify that the Developers and Testers were on the “same page”, but that only works if the Tester has not used the developers words and tried to illustrate that they do understand the upcoming features.

I think it eventually got binned off for what we called “Cycle Zero Testing” which was where the developer quickly demoed their changes; which I think was still hated by the Developers but was easier to understand the value, and was a more collaborative between the two roles.

Text Box Increase / Test Case Changed

Occasionally we may be asked to help our Software Testers run through their manual regression test cases

When I was a tester, even though writing test cases should be easy, you often find they are so tedious to write if you want to accurately describe every single step. Therefore, you may choose to be more concise with your wording or make assumptions that the person running through the test will understand what to click.

Sometimes you think you have written a brilliant test, but when you come to run it again at a later point, you realise that it was ambiguous and then might end up looking at the code to work out how it was meant to work at the time.

If the test case is misleading, sometimes the tester will then modify it to be “less ambiguous”/“correct” but there’s times where they have incorrectly changed it, causing further confusion.

I ran a test called “Enter 1020 characters into the Description Textbox ensuring to include numbers and special characters (namely ‘&’)”

However the expected result was “Textbox will only accept the first 260 characters”

Why would we be entering 1020 characters if the textbox is gonna stop at 260? Clearly something is up with this test.

So I look at the history to see if someone had changed it. It used to say enter 260, but 255 is accepted but then Sarah changed it to “enter 1020 and 260 is accepted”.

So I looked at the linked change to see what it should have been changed to (or maybe not changed at all). The item was called “Extend description from 255 to 1023 characters”

That seemed really random. Why 1023 characters? And why did the tester change the test case to 1020 (and 260) when that still isn’t enough.

Even more confusing was the developer didn’t even change it to 1023 – it was set to 1000 in the database.

＼（〇_ｏ）／

So we wanted 1023, the developer provided 1000, and the tester either tried 1020 or 260 and passed it.