Security audit/Penetration Test

Last year, my employer paid for a “security audit” for our software, and any “issue” on their report was then a high priority to fix.

I think the majority of the issues they stated were incredibly unlikely to happen, or there would be an easier means of acquiring such information. One of them was that UDP messages, which are only sent on the user’s local network – had an unencrypted username. But if the attacker was already inside the user’s building, it would probably be much easier just to look at the user’s screen rather than plugging into their network and running a packet-sniffer.

Problem 1:

Shortly after we released some “security fixes”, we had a few bugs reported, one of which was:

“Users unable to reset their own passwords using the self-service password reset process”

So the security improvement created a Security major incident. Brilliant.

Problem 2:

A colleague was explaining another problem which I only had a vague understanding of. But it was to do with the version of a protocol being sent in the header of a message. It is classed as a security flaw because an attacker can look up known security flaws for that version and try to exploit the system that way. I suppose they can still guess what the version is, and try different “attack vectors”; but I suppose the less information you give them, then the more hassle it is. As my colleague explained it, a change had been made to remove the version, and tested on a specific version of SQL server, but we have multiple versions on our live infrastructure. When it came to demoing it with a client, they discovered a particular feature was broken on a different SQL Server version. A little awkward, but at least it didn’t go live.

Potential Problem 3:

When it comes to authentication failures, if you tell the user that the login attempt has failed due to the username being wrong, or the password being wrong – you are making it easier for malicious users to attempt to gain access to someone’s account. So if an attacker is guessing usernames and normally sees “Invalid username”, and eventually gets an “invalid password” message, then the attacker knows the account exists and now just needs to get the password.

There was one API call that returned an error code along with the message “Invalid username”. So as advised by the security audit, the developer changed the message to “Invalid username or password”.

On the code review, I pointed out that the developer also needs to change the Error Code too, and they replied saying the security audit stated that the message was the problem. They definitely hadn’t thought about the actual problem. If Error Code 40 is “Invalid username” and Error Code 41 is “Invalid password”, and you change both texts to say “Invalid username or password”, then it’s not really any more secure is it? since we are still returning 40 when the username is wrong, and 41 when the password is wrong. What we need to do is make 40 and 41 obsolete, and make a new Error Code for “Invalid username or password”. However, can we actually do that? When you have third parties using your API, if they have written code which relies on certain return values – then you break their software when you change yours. So we would need to inform everyone that the change is coming, then release this change.

UDP and ChatGPT

A senior developer questioned why we were encrypting elements of the UDP message. The developer struggled to explain and then incorrectly used ChatGPT. It was obvious by the increase in technicality and vocabulary in his response.

George
Out of interest, why are you encrypting this?

Karthick
According to the system audit query, MAC address and Sending user display name and the location name should be encrypted.

George
But Mac Address is sent in plaintext as part of UDP - I can see it in your screenshot above

Karthick
It is encrypted at Controller before sending.

George
Yeah I know you encrypt it in the payload, that was my question, why are you encrypting something that is in plaintext elsewhere on the network traffic?

Karthick
In most cases, the visibility of the sender's IP address is not a concern. IP addresses are public and essential for routing data across networks. While it's possible to use techniques like VPNs or proxies to obscure your real IP address from the recipient of the data, these techniques are not directly related to the encryption of the UDP packet payload.

George
Did you ask ChatGPT for that?
Also, I'm talking about MAC Addresses, not IP addresses.
I don't really care if you encrypt it or not, I was curious what decision led to you encrypting a MAC address.

Karthick
ChatGPT gave a good answer 😃
I cannot see MAC address of Wireshark UDP packet trace done in my laptop.

Govind
From the Pen test, it is reported that the
"Disclosing MAC Addresses and display names of users provides a wealth of information to a potential attacker"
via panic alerts.
That's the reason we are encrypting the Mac address.
And I believe the Mac address showing on the Wireshark, is the system Mac address(the interceptor machine) on which the Wireshark tool is running.

George
All you had to say was "Pen testers" and I would have been happy 😄

Problems With Hosted Services

Recently we have had several major incidents due to: software bugs, incorrect configuration being applied, not renewing licence keys, and migrating servers to the cloud and failing to check all services were correctly configured and running.

Our Hosted Services team gave a presentation of work in their department, and gave more insight to even more failings that have happened recently. As far as I am aware, Hosted deal with servers, data centres and networks.

Hosted explained that due to the decision to move all servers to the cloud, when their usual time came to replace old servers – they didn’t bother. But the migration has been a slow process and delayed which meant our software was running on inferior hardware for longer than anticipated.

“We don’t need to invest in the in the architecture that we’ve got, which was not the right decision in hindsight

We had a team of people who, in some cases, were the wrong people. They didn’t have the appetite to go and actively drive out issues and reduce the points of failure in our networks.”
Hosted Manager

He then goes on to say the change in strategy caused many of their long-term staff to leave. These people that really knew how the business worked.

“So we lost around about 90% of the team over a relatively short space of time and that put us into quite a challenging position to say the least. And needless to say, we were probably on the back foot in the first quarter of this year with having to recruit pretty much an entire new team.”
Hosted Manager

Then, because they were short staffed, their backlog of work was increasing, putting more stress on the people that remained:

“We had to stop doing some tasks, and some of our incident queues and ticketing queues were going north in terms of volumes, which was really not a good place to be.”
Hosted Manager

I’ve written about this situation in the past. It has happened in the Development department when a new CTO comes in, and says that manual software testing is archaic; so people have to learn automation or lose their jobs. Then a few months later, they realise their plan isn’t so feasible, yet have lost some good software testers to other companies, or allowed others to switch roles and aren’t interested in going back. Then the releases slow down because we can’t get the fixes tested fast enough due to the last of software testers.

They go on to say the Firewalls suffered 50 major incidents in Quarter 2, and now they have “procured new firewalls” to solve it. They have reduced bandwidth into the main data centre by routing certain traffic through an alternate link. The “core switches” at our offices and data centres are “End of Life” and will be upgraded to modern hardware (Cisco Nexus 9K).

So it sounds like they have a plan, or at least are doing the best with what they have. It sounds like all departments are currently shooting themselves in the foot at the moment.

How To Confuse Women

How to confuse women

Step 1 centre the text in a cell
Step 2 watch the confusion

Alright, it’s a somewhat clickbaity title, but I did cause a lot of confusion to one person with one simple feature, and she still didn’t understand when I felt I had perfectly explained it.

I had made a grid which shows a person’s name, alongside a list of items they have ordered, in alphabetical order; as per the requirements.

Later on, we would add alternating row colours to make different people’s orders more distinctive. In the example I came up with, there were only 2 orders on a page, and I had left the row selection highlight on, so it actually looked like we did have alternating colours.

The examples from the UX team only had a few items per person, so it wasn’t clear how they had aligned the text. I left it centred which looked a bit weird for large orders, which is the example I came up with.

Person	Items
Lisa	phone
	CD
	hat
	lamp
James	phone
	shorts
	t-shirt
	towel

I don’t see how I can accurately illustrate this with WordPress. Imagine Lisa’s row is fully highlighted

Just to show my progress on this new project, I posted a screenshot to generate a bit of hype 😉

Olivia was confused about why it wasn’t in alphabetical order – but it is in alphabetical order!

Olivia: why is “phone” at the top?

Me: Ulrika's designs or my screenshot? Mine is alphabetical. Phone is for a different Person. We still need to add the alternating colours, and maybe don't centre the name to make it clearer

Olivia: this one <sends screenshot of top row>

Me: Different Person

Olivia: I'm confused. Lisa's name is the same row as Phone

Me: There's 2 People there. Just that the second person has loads of items, and because we have centred the Person name, it's halfway down the row. We can use alternate row colours as Ulrika’s design, and we can stop centering the name to make it clear.

I message Tim:

Me: I can't believe the confusion that has arisen even though I have explained what it is.
 
Tim: I got your back

<Back in the main chat>

Tim: Phone is for Lisa there
 
Tim: Everything else is for James
 
Tim: Which is when the alphabetical order kicks in.
 
Olivia: Thank you Tim, That was seriously messing with my mind. The name needs to be at the start of the list of items.

<Back to private message with Tim>

Me: I cannot understand why my explanation wasn't good enough. Is it because I am a developer nerd and cannot communicate with people?
 
Tim <jestingly>: the trick is to keep your answers on one line. Colours and shapes help too.

I’m so baffled. Why couldn’t they understand what I wrote? I wrote a perfect explanation then Tim just put his explanation in short messages, and he was thanked for his explanation. It’s like I don’t exist!

16 weeks work

Some teams post fortnightly updates on what they have done on our internal social media platform Microsoft’s Yammer (Now named Viva Engage). No matter what they post, managers seem to respond with positive comments.

So here is an example. On team stated they had completed:

3 user stories. 1 dev investigation. Improved code coverage.

Then attached was there code analysis report which stated:

0 Bugs
0 Code smells
100% coverage on 1 new lines to cover
0% duplications on 8 new lines

I never know how to make sense of these stats. The duplications mention 8 new lines, but yet the code coverage mentions 1 new line. It can’t mean it is still to cover when it states they have 100% coverage.

Their video demo stated it was “Sprint 8” which means it’s after 16 weeks of work. They logged into their application and there are 2 buttons (not functional).

My Work

I’ve been working on a game in Unity and have also reached the 16 week mark. I have only been working on it sporadically. Early on, I might have worked a few hours and many days per week, sometimes working several hours at a weekend. Other weeks, I have just worked on it for 3 hours or so in one afternoon only.

I have written loads of code for the AI state machine. Your units can automatically move in a formation, break away in certain conditions. Then the defending group of units have their own behaviours, and work together in their own way. Some behaviour is configurable so I have the logic and UI for the player to change it. There’s controls for the cameras, game speed, pausing, save and reloading. A few options for volume and a few graphical effects. There’s some miscellaneous aesthetics such as animations, particle effects.

I am a single developer working fewer hours, and I have to play-test it myself. Compared to a team with a few developers, a few testers and different managers assigned – all working a 9-5 day, 5 days a week; and all they have is a menu and a title banner.

Hackathon Progress

Another comparison is to compare the progress of developers during a “hackathon”, or “game jam”. Developers can often put together a prototype, or a fairly fun game within a few days…

Developers during hackathon: We built an entire application in just 3 days.

Developers after hackathon: Adding that icon is going to take 3 weeks.
— Mark Dalgleish (@markdalgleish) November 27, 2019

Developers during hackathon: We built an entire application in just 3 days. Developers after hackathon: Adding that icon is going to take 3 weeks.
Mark Dalgleish

Conclusion

Recently, I’ve been thinking about this a lot. I think if you start with no code, then any code you write has a much bigger impact. You also don’t have to work out how it works alongside other code.

So the more code there is already, then the longer it takes to write. So if the Hackathons are productive, then you can write something fast. Same situation with my Unity game, and since I am working on it myself, then I have no one slowing me down.

The other reason why working on my own is great is that I can set the process. Due to the 90/10 rule, I think it is best just to get it 90% done, then move on. If you change your mind of how something works, it isn’t as big of a time waste. If it turns out to be good, then you improve it later, alongside other features when you are changing code in that area.

So going back to this original team who did 16 weeks work and got nowhere – even though it’s a new software product, and should be able to code fast – I think they are slowed down by their insistence that everything has to be perfect and has all the extras like Unit Tests. So they could get a feature 90% complete quickly, but then they are told to spend several days perfecting it for that last 10% of quality. Then they are messing around with non-functional stuff like the Deployment Pipelines and miscellaneous processes. When you are months away from releasing, I don’t see the point in dedicating so much time to how you are going to deploy it, so I have been 100% focussed on features and gameplay.

I think this could illustrate how process can severely slow you down, and striving for perfection can also lead to inefficiencies. I think my idea of “90% quality is good enough” could be a winning approach, then you can always perfect the features if you are ahead of schedule. If I keep working on my game, I will worry about how to release it when it is close to a releasable state (ie how to get your game listed on Steam).

Parkinson’s Law: Another Perspective

I recently wrote a blog on Parkinson’s Law, and I recently came across this tweet about Elon firing loads of staff when he took over Twitter:

Elon Musk fired 6,500 employees at Twitter.

A little birdie told me it's down to:

– 2 designers
– 6 iOS developers
– 20 web developers
– Around 1,400 sales and operations people

How is it possible that we are still using this website?

Two words:

Parkinson's Law.

Have you…
— Andrew Wilkinson (@awilkinson) June 25, 2023

Here is the tweet thread:

Elon Musk fired 6,500 employees at Twitter. A little birdie told me it’s down to:

– 2 designers

– 6 iOS developers

– 20 web developers

– Around 1,400 sales and operations people

How is it possible that we are still using this website? Two words: Parkinson’s Law.

Have you ever wondered why seemingly simple tech companies have tens of thousands of employees? Sometimes, it’s because they have huge sales forces or tech support/operations people. But often it’s also due to Parkinson’s Law.

Parkinson’s law is like lighter fluid for bureaucracy. It’s a business tapeworm that slowly eats away at companies, making them less and less efficient and innovative over time. Parkinson’s Law is the idea that the work will generally expand to the amount of time, budget, and number of people allocated to it, and no matter how many people you allocate to it, those people will feel busy. They’ll feel busy because, due to the excess time/slack in the system, they’ll start focusing on less and less important tasks.

Here’s how it manifests on an individual level: Let’s say you have a report due in a week. The report might only take you around five hours to finish if you really focus and work efficiently. However, because you know you have a week to complete it, you might find yourself spending a lot more time on it than you need to. You’ll be more prone to distractions, take longer breaks, or perhaps decide to add more details, tables, graphs, and so forth. Essentially, the task becomes more complex and time-consuming purely because you have more time in which to do it.

And here’s how it manifests across organizations: Imagine a big tech company. A social media company with various departments. Each department has tasks that it must complete to contribute to the overall productivity of the company. Now, suppose each department is given a budget and a set amount of time to complete its tasks for the year. According to Parkinson’s Law, each department will use its entire budget and the entire allotted time, even if the tasks could have been completed more efficiently. This is because as resources and time increase, departments tend to become more complex and less efficient. For example, a department might add more steps to its procedures, requiring more approvals and creating more paperwork, which slows down the process. Or it might use the full budget on additional personnel or equipment that doesn’t necessarily improve productivity. The department might also use the full budget to justify the same or larger budget for the next year, since budgets in many organizations are often determined based on the previous year’s spending. This is a phenomenon known as “budget padding” or “spend it or lose it” mentality.

Inefficiencies can also develop in staff allocation. If a department expands, it might add managerial positions that aren’t strictly necessary. More employees are hired to manage, creating layers of bureaucracy that may not contribute to productivity and can even slow decision-making. I have seen this occur over and over again in my career. The larger the team, the larger the budget, the longer the timeline, the less gets accomplished. I’m very curious to see how many more tech companies come to this realization.

Mini Musing #9: Char

I was watching a programming tutorial recently and I heard the teacher pronounce the datatype “char” like the word “car” and it made me think.

( ͡ಠ ʖ̯ ͡ಠ) 🤔

It is a shortened form of the word “character” – as in “a single letter” – which you do pronounce “car-acter” – so it probably should be pronounced “car”.

Yet, I have been programming for years and I have never heard someone pronounce it like that.

Datadog knee jerk

To carry on the recent trend of failings and causing Major Incidents (see Printing Licence Key Expiry, and The Outage), we recently had another major problem for a small group of users due to migrating their server to “the cloud”.

From what I understand, everything worked apart from one particular service which they forgot to check, and left the feature broken for a few days. The most embarrassing part of it, was that it was our main rivals that told us it wasn’t working when they were calling our interoperability API and it was failing. It had been broken for 3 weeks!

This caused another instant reaction from our CTO and Technical Director who demanded that everyone creates a Datadog dashboard to monitor all services, regardless of what they are.

Datadog is a recent monitoring tool we purchased licences for, and is the cool thing to use and impress the senior managers with (see Datadog, and Datadog – The Smooth Out).

I discussed problems in both those blogs, but a concern with all metrics is;

What do you want to measure?
Who is viewing the data? And when?
What does “good” and “bad” look like, and who acts when that state is shown?

Another key point was made by a colleague:

“But we can’t expect some pretty Datadog dashboard templates to solve the historical problems that have meant we have lots of live services in the business with nobody who understands where they are, or how they work…

The company has a long history of developing a solution, moving the team that developed it off onto a new project, and leaving that solution behind. Combine that with a massive wall of confusion between Dev and Hosted, you have Hostedrunning a bunch of servers that they have no idea what they do.”

So do the developers really understand the way things work once it is deployed? Does the development team know how to create an effective dashboard, and how to act upon what it shows?

After the CTO had decided every team needs a dashboard, I was invited to a meeting with several people from different teams. One of the Test Managers said it was

“a knee jerk reaction. We want this and we want it now”
Test Manager

Then he goes on to say:

“I know nothing about Datadog, yet have been told to make a dashboard”
Test Manager

People were also told that it was the number one priority and so we need to pause our current development. The CTO claimed it:

“should take a week. A relatively simple ask

ANYTHING you are doing at the moment is secondary to this. The only exception is a major incident. If you get invited to any other meeting, invite the Tech Directors and they will get it cancelled”
CTO

People that knew more about how Datadog works raised concerns with performance issues. If Datadog is running and sending metrics every minute, it will cause way more network traffic than we had before – and we already have a problem with our networks not being able to handle the current load.

Again, someone came up with an idea that the servers could send their metrics to a server which acts as a middle man, then that can send the data to Datadog. But this idea doesn’t make sense, you still have the same number of servers (well, plus one server) sending data on the network, then the central server then needs to send a massive amount of data in one go.

Are people going to create good dashboards?
Is the data they are showing accurate?
Are we going to act on them when they show that something has gone wrong?
Is the increase in metrics going to create performance issues?

Indian Expo

Recently, I blogged about how managers love any excuse to go to India to visit our office over there. Then they write a blog on their experience, stating how important it is for face-to-face collaboration in an office environment… before returning to the UK and telling us how working remotely from home is the modern way of working, and has no impact on efficiency.

They actually spend most of their blog writing about the local cuisine and the landmarks they saw; so it’s definitely a holiday and not a work trip at all.

I also wrote about The Expo, which is where the entire UK side of the company travelled to one location to watch many in-person presentations (which we could have just watched remotely like we normally do). Then when it is “business as usual“, managers are telling us to find ways to save money, and how we want to become a carbon-neutral business.

So after dumping loads of money into travel costs, hotel expenses, venue hire and catering for the Expo in the UK, they decide it would only be fair to host a similar thing in India… which means getting all the directors and senior managers to fly over there to do the presentations.

Obviously they used the opportunity to post a blog about the importance of face-to-face collaboration, Indian landmarks and cuisine.

Key phrases from their blog are as follows:

The India Office

“I am amazed at how much we were able to accomplish”
“India greeted us with its vibrant energy and diverse cultural heritage”
“The workspace was a fantastic environment, promoting team collaboration and productivity”
“Witnessing the teams working closely together was inspiring, and the entire place was abuzz with creativity and a real growth mindset”
“The office boasted excellent facilities, including communal work areas, private group session rooms, a gym, nap rooms, massage chairs, a food court, and garden”.

Expo Day:

“The Expo day itself was an exhilarating experience, with a buzzing atmosphere and a large number of attendees”

“Representing the team on the stands was a humbling experience, as engagement levels were high and the audience had a deep understanding of our work, asking probing questions around aspects of safety, governance and our products.”

Cultural Experiences:

Visiting the UNESCO heritage site at Mahabalipuram allowed us to witness the interplay between Hindu, Chinese, and Roman architectural styles in this historic trade centre.
Learning about the story of Draupadi and understanding the long history of international collaboration.
Our visit to DakshinaChitra cultural heritage site, highlighted the vastness of South India and its rich diversity.
Meeting the skilled craftsmen and hearing them describe their trades first-hand provided a deeper appreciation for the diversity of people and their skills across the country.
We learned about different rice and cooking methods for Biryani, and the amazing flavoursome vegetarian dish suggestions.