Indian Expo

Recently, I blogged about how managers love any excuse to go to India to visit our office over there. Then they write a blog on their experience, stating how important it is for face-to-face collaboration in an office environment… before returning to the UK and telling us how working remotely from home is the modern way of working, and has no impact on efficiency.

They actually spend most of their blog writing about the local cuisine and the landmarks they saw; so it’s definitely a holiday and not a work trip at all.

I also wrote about The Expo, which is where the entire UK side of the company travelled to one location to watch many in-person presentations (which we could have just watched remotely like we normally do). Then when it is “business as usual“, managers are telling us to find ways to save money, and how we want to become a carbon-neutral business.

So after dumping loads of money into travel costs, hotel expenses, venue hire and catering for the Expo in the UK, they decide it would only be fair to host a similar thing in India… which means getting all the directors and senior managers to fly over there to do the presentations.

Obviously they used the opportunity to post a blog about the importance of face-to-face collaboration, Indian landmarks and cuisine.

Key phrases from their blog are as follows:

The India Office

  • “I am amazed at how much we were able to accomplish”
  • “India greeted us with its vibrant energy and diverse cultural heritage”
  • “The workspace was a fantastic environment, promoting team collaboration and productivity”
  • “Witnessing the teams working closely together was inspiring, and the entire place was abuzz with creativity and a real growth mindset”
  • “The office boasted excellent facilities, including communal work areas, private group session rooms, a gym, nap rooms, massage chairs, a food court, and garden”.

Expo Day:

“The Expo day itself was an exhilarating experience, with a buzzing atmosphere and a large number of attendees”

“Representing the team on the stands was a humbling experience, as engagement levels were high and the audience had a deep understanding of our work, asking probing questions around aspects of safety, governance and our products.”

Cultural Experiences:

  • Visiting the UNESCO heritage site at Mahabalipuram allowed us to witness the interplay between Hindu, Chinese, and Roman architectural styles in this historic trade centre.
  • Learning about the story of Draupadi and understanding the long history of international collaboration.
  • Our visit to DakshinaChitra cultural heritage site, highlighted the vastness of South India and its rich diversity.
  • Meeting the skilled craftsmen and hearing them describe their trades first-hand provided a deeper appreciation for the diversity of people and their skills across the country.
  • We learned about different rice and cooking methods for Biryani, and the amazing flavoursome vegetarian dish suggestions.

Failing in different ways

I occasionally meet up with my old university mates. One friend works for a contracting company. He is a really good developer on a juicy £100k wage which is crazy, and it shows how much money companies waste on software on short contracts. Often they may only have small, permanent development teams, then hire some temporary staff for extra capacity, or they might just fully outsource their software requests entirely.

Since he works on short contracts, often 3-6 months, (sometimes up to a year), he has experience seeing how many companies operate.

I love having discussions with him because he is incredibly knowledgeable and always keeps up with the latest jargon. He knows all the Cloud and Web-development jargon, popular software tools, and Agile process.

He came out with this statement:

“It’s reassuring that you get to work with different companies doing different things, but they are all terrible. You are often asked to help – not because it’s going well; but because it’s going wrong”

What he was saying is that companies have different philosophies when it comes to software. It could be Cloud only, on-prem only, strict Agile development, different levels of automation etc, but whatever they do, it doesn’t quite work, and they will make baffling decisions.

So when I write about the baffling decisions where I work, it’s just that we are failing in a different way to other companies.

Datadog – The Smooth Out

I recently wrote about Datadog, which allows you to create dashboards to monitor servers. It’s the “cool” thing to use at work and the CTO is heavily promoting its use.

I discussed how it’s quite confusing to use, and there’s certain limitations that you have to bear in mind. I also suspected people were creating dashboards then assuming that they worked because it was showing some data – but when you try and verify the data; it turns out some of the dashboards were showing absolute nonsense.

One guy, who had been working with Datadog for months, only just noticed a problem with his dashboard.

“In our team, we monitor the response time of our API calls, and over the last 3 months, we’ve seen a remarkable increase. In May, we were seeing around 140ms, but now we are seeing 550ms.”

So I loaded up his graph and my default view was “1 Hour”. I switched it to “past 1 Day” to zoom out, and the graph looked a bit different to what I expected. The first graph shows 11-12, so on the second graph, just look at the far right at 11-12.

The first graph shows a spike around 11am and is roughly around 1.6 seconds, then drops down to 0.2 and remains very consistent. Yet the second graph seems to spike up to 1.15 seconds then drops down to around 0.4, then finally spikes up to 0.6 at the end. 

As I switched between the views, the numbers seemed to differ by larger margins.

I then moved to the day that he mentioned, 7th July. Then picked a particular time 12:05, then made a note of the value as I switched between the views, (although that particular time didn’t exist on all graphs, but I couldn’t be bothered re-checking them for more accuracy)

ViewRecorded Value
15 mins179.65
1 hour176.51
4 hours156.1
1 Day372.89
2 days (12:00 time used instead)552.93
1 week (12:00 time used instead)554.93
1 month (11:00 time used instead)550
3 months (1:00 used instead)559
12:05 Friday July 07

He didn’t seem to be using any weird formulas, so why is the data so wildly different? I didn’t report my exact findings from the table, but another developer chimed in with this comment:

“Datadog’s long-term reporting is poor when it comes to averages. I would take a 1-4 hour window as a good sample size. Anything 1 day plus, the results are clearly not accurate”

Datadog user

So his statement seems consistent with my analysis. So why is Datadog so hyped up in the business – when people who use it don’t even think it is reporting accurate figures? Additionally, it sounds like when they have noticed, they have kept it to themselves and not shared this knowledge.

He then theorised that Datadog aggregates an average of its datapoints over a period of time e.g. 24 hours, then plots that. To me, it doesn’t make much sense because if it has a sample of the exact value for that time eg 12:00, then why would it need to take the average from 12:00 on that day till 12 the previous day, especially when you could be monitoring traffic which is time-sensitive eg have a spike in the morning, then lowers in the afternoon.

After searching on the Datadog documentation, we found this:

“As Datadog stores data at a 1 second granularity, it cannot display all real data on graphs. See metric aggregation for more details.

For a graph on a 1-week time window, it would require sending hundreds of thousands of values to your browser—and besides, not all these points could be graphed on a widget occupying a small portion of your screen. For these reasons, Datadog is forced to proceed to data aggregation and to send a limited number of points to your browser to render a graph.

For instance, on a one-day view with the ’lines’ display, there is one datapoint every 5 minutes. The Datadog backend slices the 1-day interval into 288 buckets of 5 minutes. For each bucket, the backend rolls up all data into a single value. For instance, the datapoint rendered on your graph with timestamp 07:00 is actually an aggregate of all real datapoints submitted between 07:00:00 and 07:05:00 that day.”

https://docs.datadoghq.com/dashboards/guide/query-to-the-graph/

That explanation sounds fine in theory. If the graph is showing each hour, then each point can be an aggregate of the previous hour. But what that should mean is that it is a smoothed value. So if you have “zoomed” into a minute-by-minute view, and see a very squiggly line of rapid but small fluctuations, if you zoom out to the hour, then the line should look fairly straight, and the value should be an average in that time period. I don’t think it explains how my first graph probably has an average of just over 0.2ms then it showed 0.4ms.

There’s this useless video from Datadog which is supposed to show how your graphs are “smoothed out” when zooming out. It has no sound so you have to interpret what they are showing you with their mouse:

At 12:04, they show the value of 13.32% in the (15 mins) view, and at 12:05 you see it drop down to 9.07%.

They switch to the 4 hours view

I’d say the 12:04 value is now 15.8% but they start hovering over times around 11:17

When they switch to 1 day, they then start looking at data around 6:45, so clearly they have no idea what they are even doing. In this view, the times around 12:00 are ~14%

With such small timescales, surely there shouldn’t be that much variance with the data. Surely it is important to get accurate figures when looking at the micro level at less than 1 hour views, then when you zoom out, the figures should be consistent.

IT Tales

Here is a collection of a few fails by our IT Department.

PC shutdown & Usage Monitoring

Even though we work from home, we still have some PC’s in our office that we remote onto. There’s certain systems that only seem to work when on the physical network so people often call this a “jump box”. Our IT was planning on temporarily moving our PCs whilst some electric work was being done in the office. I was invited into a Teams chat which was supposed to be for everyone affected. After skimming the list, I spotted 3 people that were missing, and other colleagues spotted others. 9 people were missing in total! How do they not know who owns the PCs? They have been citing “increased security” in recent times. Surely a security risk if they don’t know who uses PCs on the network.

More recently, I was contacted again via email asking “if you use this PC”. Again, why do they need to ask if we use them? Isn’t that a security concern if not? Surely they know, especially when they have installed extra network security tools recently. I thought they had said software monitors network traffic and alerts for anything suspicious.

Upgrading Software

I was contacted by IT saying my SQL Server version was no longer supported by Microsoft, so I need to urgently upgrade it by the end of the week due to being considered insecure. They said if I want an installer, please reply. I thought it would be easy enough locating the installer, but it seems Microsoft’s SQL Server pages are very confusing. So I replied asking for the installer. They ignore me. I reply again, they ignore me. Months have gone by. So not that urgent then.

IT then announced that they are taking increased security measures and are removing all admin rights from our PC’s. Now we can only install software with their permission. They also said it makes sure we can’t install unlicensed software, since it is easy for someone to install software that is free for personal use, but is paid software for commercial use, and then the business can be liable.

A week later, they then email us saying there is a known security vulnerability with our Visual Studio version so we need to update it. We can’t though, we need admin rights to keep our software updated and secure! So now we have to log tickets, then they remote on and type in the admin password to proceed. I bet they love that.

In a similar fashion, they are more fussy with USB devices. They sent one of my colleagues a new laptop but it rejects his smart-card reader which he needs for testing. Can’t be plugging in USB devices these days.

Saving Money

They also said they wanted to be more stringent when it comes to licence keys, as we seem notorious for purchasing more licence keys than we need, then we might stop using software then still pay. I was contacted in early July 2022, saying that I have had a Jira licence for the last year but have not being using it:

We currently purchase a licence for you to access Jira. We understand a lot of the users will have now migrated to Azure DevOps and as such, your access may no longer be required.

May I kindly ask you to respond to this email by 12pm Friday 8th July confirming whether or not you continue to require access?

IT Email

So I reply saying I wasn’t using it and I don’t think I have used it for 2 years. I then got contacted again in February 2023 saying the same thing. I confirm that I don’t need it. I then got contacted earlier this month asking me again. So I’ve had a licence for 3 years now for a product I don’t use at all.

Pride At Work

During Pride month, there were a few Yammer (now known as Viva Engage) posts about LGBT issues. One guy made a blog post about how gay people were denied the opportunity of blood transfusions until recently. It was informative but I did think it was a weird thing to post at work – given the word-count of the word “sex” reached double figures and contained the phrase “anal sex” along with other sexual references.

If you take that out of the context of “pride”, wouldn’t discussing or writing about sex at work result in you being on a call with a member of HR?

I discussed it with a few of my colleagues. One guy said he thought he “had crossed the line with his phrasing and could have easily worded it in a less explicit way”. Another colleague stated that “although I support Pride, I don’t feel I should be reading about it at work“. That is actually a good point. Although there can be important social issues in the world, if it has nothing to do with work, then why are we reading or talking about it when we should be working? I’m sure there was even some policy we had to agree to – that said you couldn’t discuss religion and politics because if someone had different beliefs to you, then they may feel excluded.

It made me think that – because LGBT is the current hot-topic, then it trumps all existing work policies, and you aren’t allowed to say anything against it. This is even more contentious when this particular topic could be against someone’s religious beliefs (we do employ a significant number of Muslims, and a certain number of colleagues could have opposing views regardless of religion).

To conclude Pride month, a member of HR posted the following:

“Lots of events take place throughout June every year to celebrate the LGBTQ+ community and all the progress that has been made across legislation, attitudes and behaviours.
Personally, one reason I find these events so wonderful is because they bring together people of all ages and I see so many families attending together with children – what better way to encourage change than to teach children about positive attitudes and behaviours and set a great example for them.”

HR staff member

I laughed out loud when I read that. I really wanted to respond, but thought I’d end up being unfairly sacked. So I wrote this blog instead.

Maybe the average person hasn’t heard about all the controversies this year, but recently, I’ve spent a lot of time on Twitter and been watching a lot of Daily Wire content. I suppose the more stuff you view on Twitter, the more it recommends the content, and so if you have any hint of an opinion, then it becomes stronger with “confirmation bias”. I’ve generally been interested in conspiracy theories and hot debates, so Twitter has pushed a lot of this content to my feed.

Don’t get me wrong here, I’m not against LGBT in general, but am opposed to it being directed at kids (which a lot of people from the likes against Daily Wire are making content about), and Twitter seemed to like showing me everything that Gays Against Groomers were Tweeting, and that’s their purpose.

oh, won't somebody please think of the children - The Simpson's meme

So let’s go through some examples of what I am referring to here. If I remember correctly, the first controversy was a “family-friendly” Pride event where gay people in fetish gear were being whipped on top of an open-car. The next was a photograph of a curious girl about 6 years old who had approached 2 guys who were wearing that dog-themed leather bondage gear. A point here is – this content should only be known about if you go out of your way on an 18-rated website. Instead, people are in a public event where they knew that kids would be at, dressing up and even simulating these acts.

I actually only came across that particular fetish due to a colleague mentioning that a former male colleague had an OnlyFans with his boyfriend, and it was the company’s discovery of this fact that had forced him to leave the business. Given that the colleague that was telling me this had a reputation for exaggerating and lying, I asked him to prove it, and he linked me to his pages. He was telling the truth 😱😳

If my employer really is fine with this gay fetish aspect, then why was our former colleague sacked? Probably some hypocrisy there.

So I only learned about this fetish attire by going out of my way of the dark side of the internet, and here we have the likes of members of our HR department stating “I find these events so wonderful is because they bring together people of all ages and I see so many families attending together with children – what better way to encourage change than to teach children“. I find this sentiment being echoed among many that are presumably scared to be labelled a bigot for speaking out about it.

Some YouTubers stated that when they made content using such Pride footage, they were labelled as “adult content”. How can a “Family-friendly” event be adult content? Oh because it is adult content!

It’s considered a faux-pas to criticise Pride, but yet, if this same thing happened outside the context of Pride, people would call these people a “nonce”/”sex offender” and demand they be locked up for public indecency. This is what the group Gays Against Groomers stands for. They are against grooming kids. They are against exposing children to 18-rated content. Yet, they posted videos of their van parked at a Pride event and people were coming up to it and spitting on it. That’s right, people are openly fine with grooming kids these days. We used to want to protect kids at all costs, and we seem to have lost that over the last few years in pursuit of wokeness.

There was even the controversy with the Twitch Streamer NickMercs who Tweeted “They should leave little children alone. That’s the real issue” (it was in the context of a vote to celebrate Pride at a school), then Activision removed his character “skin” from the game  “Call of Duty: Modern Warfare II | Warzone”. This then resulted in a minor boycott/review bomb, and people mocked Activision with the phrase “Call of Groomers“. How far has society fallen if stating “leave little children alone” is considered a controversial statement?

To go back to the first thing the HR staff member said “Lots of events take place throughout June every year to celebrate the LGBTQ+ community and all the progress that has been made across legislation, attitudes and behaviours.“. Progress made? So in addition to the examples of Pride becoming fetishized, you also had the transwoman that exposed their breasts at the White House, Puberty blockers banned in the UK , the boycott of Bud Light in the US due to the promotion with Dylan Mulvaney, the boycott of Target due to stocking chest binders which tanked their share price, the banning of Drag Queen events, men identifying as women to avoid Men’s prison, and more people speaking out against Transwomen in sports. So the Trans community has taken hits in their PR in this Pride month.

There was also the incident with Billboard Chris, where he was speaking to someone about how it is wrong to give puberty blockers to children, when a transwoman began screaming obscenities repeatedly in his face. Chris did his best to ignore her, until he got punched in the face. Despite having several police as witnesses, and having the event caught on camera, the police refused to prosecute the assault, and blamed Chris for being antagonistic. Pride Month ain’t it – Commit all the crimes you like.

So I’d say the LGBT movement had gained more and more support over time, but this year, it took a massive step back. I wouldn’t be surprised if further controversies were more widely publicised in future.

I think issues should be raised and discussed with logic, and not dealt with whilst being blinded by wokeness and hypocrisy. People need to take a step back, clear their minds and really decide what they actually believe in.

Assault is wrong. Grooming kids is wrong. Sex shouldn’t be discussed at work. I hope we can agree with that.

Datadog

Introduction

In recent times, the likes of the CTO have stated that we need to use modernised technology and tools. One aspect that they love is software that produces statistics/metrics that we can then judge improvements over time.

When we buy software licences for such a tool, there is always hype among certain groups of people who will volunteer to take ownership and work on implementing such software (installation, training, creating a “best practices” process), and will take any opportunity to highlight it to the managers.

So the “soup of the day” is a tool called Datadog which seems like a very powerful tool and has all kinds of integrations. I found this “jack-of-all-trades” approach was difficult to really understand what Datadog was for, and why it was different from what we had before. I knew we had Dashboards that showed which servers were running, their processor/memory usage, and which versions of our software was installed, and more. Datadog is used for this purpose too.

https://twitter.com/_workchronicles/status/1509146599355781122?s=20&t=QxTz3UkI_BvJg3WdTXk12w

Jargon Sales Pitch

One reason why it is difficult to understand is that Datadog’s webpage spouts loads of jargon, but also internally, managers love spouting jargon too. Here is what one DevOps member said about Datadog (warning – these next paragraphs contain a lot of jargon):

“As our organisation continues to grow and evolve, it is essential that we have a comprehensive and centralised observability solution in place. Currently, we are using multiple disparate siloed monitoring tools, which not only is inefficient but also hinders our ability to identify and resolve issues promptly. This leads to decreased visibility and a lack of agility in our operations.

Datadog observability provides a unified platform that consolidates all our monitoring, logging and tracing tools into one solution. This not only reduces the complexity of our monitoring landscape but also gives us a single source of truth for all our operational data. By implementing Datadog observability, we will have the ability to quickly and easily identify and resolve issues across our entire infrastructure, reducing downtime and improving overall service levels.

Moreover, Datadog observability offers the ability to deploy configuration changes to the Datadog agent with agility, which is critical in a fast-paced and dynamic environment where changes to our infrastructure occur regularly. With Datadog observability, we will be able to quickly and easily make updates to our monitoring configuration, ensuring that our monitoring remains up-to-date and relevant at all times.

With a pre-approved change, it will be easier for us to leverage the 600+ integrations that we can configure to further enhance our current infrastructure observability, root cause analysis and incident mitigation. This will allow us to gain greater insights into our operations, improving our ability to identify and resolve issues before they become critical.

In conclusion, authorisation and creation of a Datadog pre-approved change will bring numerous benefits to our organisation, including increased visibility, improved agility, and reduced complexity. This solution will help us effectively monitor and manage our infrastructure, ensuring that our operations run smoothly and efficiently.”

DevOps Engineer

That really sounded like he was saying the same thing multiple times and was really emphasising the speed. I think a concise statement is that “Datadog is one software product for monitoring, and can replace many metric tools that we currently have”. So I would imagine it should be cheaper (paying one licence rather than several), and since it is all in one place – probably easier to create new dashboards.

Jargon From The Docs

On their page, Collect SQL Server Custom Metrics, they show how you can run a custom query involving a person’s age. Isn’t that a terrible example? This would run every minute (or whatever it is configured to do) and you will create graphs from this. Without good examples, it’s hard to understand how or why you would use this feature. Other problems are due to excessive jargon.

“In v6, DogStatsD is a Golang implementation of Etsy’s StatsD metric aggregation daemon. It is used to receive and roll up arbitrary metrics over UDP or Unix socket, thus allowing custom code to be instrumented without adding latency.”

Datadog

“Enabling JMX Checks forces the Agent to use more memory depending on the number of beans exposed by the monitored JVMs.”

Datadog

Official Training

Members of the Network team, DevOps, a few managers, and some volunteers (who want managers to look upon them favourably) – signed up to a session with official Datadog training staff. These sessions were recorded, and I watched these and made a few notes; although it was just riddled with jargon and hard to know what anyone was talking about.

“Datadog Expert Services, or DES for short, is a set of “guided hands-on keyboard” pair-programming sessions. These collections of sessions, collectively known as an engagement, are time boxed and specifically designed to enable you to get the most out of Datadog while adhering to best practices. In this session, our team will work you to configure and deploy the Datadog Agent. This includes deployment in a standard, or Kubernetes containerized environment.”

Datadog

There were 2 courses that it seemed that these people were enrolled on

NameTimeCost
QS-INF course2 Weeks + 5 Sessions, Curated$15k
QS-INF-LOG course3 Weeks + 8 Sessions, Curated **$25K

Training cost is bonkers isn’t it? Once you have paid all that, then it pushes you toward the sunk-cost fallacy.

One of the Instructors asked what our infrastructure was.

“we’ve got resources and infrastructure in Azure, with a bias towards AWS, then we have on-prem; most of it is Windows Server. A combination of 2012…and onwards. 2016, but mainly 2019 as well. They also run on Windows HyperVisor, and also VMware – so they are virtual machines. But actually, we also have physical servers as well.”

deployment dude

Basically, we just made it up as we went along and got all the things! It sounds like a similar thing was done with the monitoring, because the deployment dude said we have “16 or 17 on-prem monitoring tools, as well as custom Powershell scripts to generate some data to monitor

The Datadog Instructor explains that we have to log tickets if it is outside our “engagement time”. They will reply when they can but there’s no set time-frame.

“That’s fine with us, we log enough tickets already, so that’s fine. I think we will welcome that.”

DevOps Engineer

It’s almost like we were taking any opportunity to slag our company off.

No Going Back

Good news everyone!

The DevOps engineers with support from the Architecture Team have levelled up our Live datacentres!

How? With estate wide deployment (completed Friday evening) of the incredible, uber-awesome full stack monitoring SaaS Datadog!

If you’re aware of Datadog’s capabilities, effortless integration and out-of-the-box features you’ll appreciate how monumental this is.

For the uninitiated, Datadog in a slick, AI driven, intuitive UX allows full stack monitoring of servers, databases, tools, services, containers, et al.

Effortlessly switch from viewing the entirety of all network traffic to drilling down into individual requests, logs, payloads, processes, you name it, in real-time.

Going forward we envisage significant improvements to our reaction and mitigation of all types of incidents, minor to major!

We are currently trialling access – To request access please join our Slack channel.

Stay tuned as we have more exciting stuff coming as a result of our DevOps strategy!

Watch this space!

DevOps Engineer

Web-based Demo

One team put together a small web-based app and presented a demo to the department to promote Datadog, and obviously, take the opportunity to look amazing in front of the management.

The team lead was trying to show a feature called “Cumulative Layout Shift” but didn’t explain it. He made out it could track how many parts of the website load – so you know how sometimes you load a webpage and might see some text, then an image suddenly pops on screen, then some adverts, and often it causes the layout to change, then some more adverts appear, possibly changing the layout once more? It’s not a smooth user experience and causes a lot of jerks if the user tries to navigate the page before it has fully loaded. So how does Datadog track that? What is tracking it? and wouldn’t that mean there are multiple server calls to Datadog to log it? The web page is already slow, so why would adding extra server calls back out make it better? I can’t see how that can be performant, especially when you have thousands of users. Isn’t this process logging an insane amount of trivial data over time? I think I was left with way more questions than answers.

He also said it can track time spent on a particular web page, view count, error count, action count, frustration count (he claims Datadog tracks clicks out of frustration. How?). When people are already worried about the amount of monitoring/tracking/surveillance with the likes of tracking cookies – and then websites can track you to this granular scale with Datadog; it is a bit worrying isn’t it!?

Everyone should use Datadog

In following department meetings we were told by the CTO that all teams would eventually use Datadog and we need to increase the amount of monitoring, and to do it quickly to take advantage of the benefits of the tool.

My manager wanted our team to create a Datadog dashboard. Even if it wasn’t that useful, she wanted to be among the initial users – probably to look good to her manager.

I asked one of the smartest developers if it was even suitable for my team. He was looking into creating a dashboard for his team, but his team had an API that third-parties could use and it was prime for this kind of monitoring.

He was a bit vague though:

“You could create a custom metric for it. But I wouldn’t be too sure. I’m probably going to use custom metrics for “#messages per APP per minute” sort of thing. But I can get all that from my Logs/Traces. You’d have to have something pulling that data from the main databases which would involved pushing it to Datadog.”

Principal Developer

I asked other people that were using it, and people just kept on saying they weren’t sure, or maybe others have done it.

“We’re making heavy use of Datadog for our new software and I believe it’s also being used in other areas too. It’s incredibly powerful and provides a huge amount of detail. Getting the most out of it is important and also building some standards seems like a good idea. Do we have any thoughts around how we ensure we lead the way with this and get some standard/learning/documentation in place?”

Developer

No one can give a straight answer when it comes to this stuff. People are like “EVERYONE is using Datadog“, then when you ask about it in more detail, they are like “well SOME teams are using Datadog“, then when you ask more people, they are like “there are some metrics but not quite the ones you want

Performance Problems

I asked my Software Architect friend (who seems to know everything) if Datadog is as flawless as people were implying. My intuition was thinking it cannot have zero disadvantages.

Me
Won't Datadog just cause performance issues if we start monitoring everything?

Mark
yep, or run while patching is in progress and block access to the Database/tables, which has already happened. Running ad-hoc scripts is a fairly bad idea
Hosted had to run patching twice the other week, which pushed us out of our Service Level Agreement.

Me:
this juicy gossip keeps quiet doesn't it

Mark
yes because Datadog is a massive success and we paid lots of money for it


Technical Director

Recently we hired a “Technical Director”. He asked how Datadog was coming along and if we can highlight any issues so he can get involved. This prompted John to go on a rant. The TLDR of this section is that “Software Developers don’t know about infrastructure of the Live Production environment.”

I think one of the company’s biggest challenges is how many products we have, and how diverse they are. We have no real standardisation due to a number of different factors, not sun-setting old services, not tackling tech debt, products that were developed by other companies and came to us via acquisition etc..

As a result, I think it’s difficult for us to template things out such that it can work for multiple people.

Realistically, each team for each product needs to look at how their product works, how it’s used, what tech it’s built on, and build a solution that works for their product. And I think one of the biggest challenges at the company is the ‘DevOps wall of confusion’ isn’t just a normal wall, it’s a Trumpian 15 foot high one with razor wire. Lots of products have dev teams (assuming they have one at all!) with little to no exposure or knowledge of how production works and what it looks like. For so long dev teams were told they had no role in production, no need to access it and were kept locked away from it

For reference, I used to think like that. I’ve been here 15 years and I have been part of the mindset in the past. It’s changing, and I’m happy to be one of the people pushing for that change, breaking down that wall of confusion. But that’s one of your biggest hurdles – is that people don’t know what to monitor in production because they don’t know what it looks like, and trying to monitor it by just copying a template that worked for somebody else, but doesn’t work for their solution isn’t a way to solve it

The key to unlocking Datadog for me, is to get people to have visibility of production, to understand how it’s used and what it looks like, and then start to work out what metrics are important, and what “normal” looks like so we can alert when we deviate from that

I can talk for hours about this, my team has one of the best observabilities out there, and had it before Datadog came around. If you want to have a chat, happy to have a discussion about what we can do.

I may have painted a somewhat negative opinion above, and I agree that there are things that we can improve. But we can’t expect some pretty Datadog dashboard templates to solve the historical problems that have meant we have lots of live services in the business with nobody who understands where they are or how they work and crucially expect Operations 24/7 to be able to magically pick up the pieces and fix it by themselves when it falls apart.

Yes, the company has a long history of developing a solution, moving the team that developed it off onto a new project, and leaving that solution behind. Combine that with a massive wall of confusion between Dev and Hosted, you have hosted running a bunch of servers that they have no idea what they do.

Case in point right now, the “Login and Identity service” is in the main data-centre, and we also have one in the DMZ that was built for the Mobile app, but nobody is quite sure what the main one is for. I have some notes that indicate it was built for the Connect app, but Connect doesn’t use it. Yet still that production sits there unused with nobody sure why it’s there.

You’ll find a team that has maybe done work in the past on Appointments, maybe even recently. Are they currently working on Appointments? do they have any knowledge or visibility of production? is it even on their radar that they should be monitoring the performance of it?

This goes deeper than just dashboard templates, it’s a company culture problem

John

Anomaly detection works well if the metrics are predictable for different periods of a day. It’s not a “AI” as we thought when I tried it out, it’s more of a fancy algorithm than machine learning.

I found with XMPP that method would work OK for Mon-Fri, then the alert will trigger all weekend because traffic wasn’t as high those days.

Lee

Scheduler

I was added to a group chat where the team was discussing how to use Datadog on our “Scheduler”. It sounds like an easy task, but there’s way more nuance and technicality to it. The main problems we have with the scheduler is that:

  1. some jobs fail and then wait to be reset,
  2. some jobs run but get stuck (I think most cases the scheduler isn’t informed that the job has finished, so it fails to schedule the next run).

The TLDR of this section is that: there is a lot of junk data (and I mean A LOT) and reporting on these figures can be somewhat misleading because failed jobs for organisations that no longer exist aren’t a problem (although we should stop them from running since they are obsolete).

John
Surely we need something that runs that shows us:
a count of jobs in Error Status 	
A list of jobs with a status of Running, 	
Long Running Jobs


Matthew
We'll want to take into account the normal running time of a particular job. We don't want to be alerted about a job that usually takes 2 hours to run and it has only been 1 hour.
We'll get to ones that get stuck quicker if they usually take a minute to run

Dave
Someone should run some queries against live now, and get a picture that looks correct.

Matthew
We also want the data to be meaningful otherwise we'll be back to where we are now - where we don't know what's actually working and what isn't. There's a balance to be had here
Christian
Can we summarise the Key Performance Indicators that will cause an action that needs to be performed? These become multiple metrics IMO, that add together to give better context

John
1. Job queue building up
2. jobs failing and not being reset
3. jobs getting stuck

Matthew
• Large numbers of jobs not running when they should
• Jobs stuck in running beyond their normal running time
• Mass job failures
• Mass job queues (this has the potential to false flag when workload is high)

John
There's a bug / unexpected behaviour where the scheduler can fail to update the database with the result of a job and the table shows it in status Running. Scheduler logic queries the tables for what is / isn't running. Leaving it to make decisions that it can't do stuff because a job is "running" when it in fact isn't.

Matthew
If this is a bug, the smartest thing to do after the monitoring piece is to fix the piece of software causing it surely?

John
the secret to any good bug report is reproduction steps, and it's not an easy one to reproduce
You mentioned you'd had one get "stuck" recently. Do we know how to reproduce that behaviour again on demand?

Matthew
"Just let the scheduler do it's thing and wait" is the only way we know how to replicate these

John
hence why any developer would struggle to fix it because it's difficult to know where to look if you can't reproduce it

Christian
"Treasure what you measure" or "Measure what you treasure". Simple counts and alerts will likely get us to what we need very short term which is to prevent or proactively manage / reduce potential for a Major Incident.

Matthew
I've got some initial queries together for this that could be used to alert on if the numbers get too high. I'd appreciate someone who knows TSQL to have a look and to suggest any improvements to the data being returned.

John
the 3000 + jobs in error is scary
Do we need to filter that list by only jobs that we know get automatically reset by the Hosted DBA agent jobs?

Matthew
Maybe, I did think that but I also thought that we should really know how many jobs are actually in error

John
I know that list in itself is a problem. But I think all Domains are going to have a high failed count and it's difficult to know if there are important ones in that 3000 + count

Matthew
We shouldn't alert on that metric, hence the one to track how many in error for the last hour
The scheduler is a massive mess and that 3000+ count suggests we have a fair bit of clean-up to do.

John
the only suitable metric I can think of for "important' is ones that the Database Administrators already deemed as important and created automated resets for.

Matthew
I could add an additional "Important" row to the general stats that includes those (or excludes any that aren't those)
Need that info from the Database Administrators though

John
Do we maybe need a couple of groups rather than just 1 "important" group

Matthew
I'd rather split the jobs and call out their names though, rather than pile them into one huge count if we're doing that
Let's get the data in Datadog first and see what normal looks like and tune alerting accordingly
JobTypeIDs aren't consistent across the estate by the way, so you'll have to match on JobTypeName with a join to the JobType table
<Image of 2048 Ready jobs. 47 Error>

John
Interestingly those 47 jobs are haven't run in years. some of them last ran successfully in 2016
but we're resetting them every day to try and run and constantly failing (job run count of 271,280)

Matthew
Hence my comment about a lot of clean-up - I'm willing to bet these are trying to run for closed Orgs, or orgs that have moved endpoint

John
Each Domain will probably need work to get rid of all the false alarms
I know when I checked one domain there were 40 + jobs that had never ran and were just constantly being reset
Maybe an idea to simply disable these and change the script to add & Enabled = 1 to the filter so you count only enabled jobs?
That should help remove the false positives you know about - then you can actually alert if the value goes above 0 for jobs in error

Paul
We are assessing whether the best approach to reduce the number of scheduler incidents is to deliver the new scheduler with improved logic and Datadog integration which will take time. 
Or support the integration of Datadog with the current scheduler.

Matthew
If it's the former, should we still do the latter anyway until the new scheduler logic is in place?
I suppose what I'm trying to ask is will the time-frames for implementing the new logic be quick enough to satisfy the urgency of monitoring the scheduler?

Paul
Yes agreed, we have just reviewed the last 9 months of incidents and having Datadog reporting would have given us the insight to avoid a number of these.

John
As well as adding an "enabled=1" filter Matthew, do you think it's worth adding a runcount > 0 filter as well to avoid counting jobs that have never ran?
For the sample Domain I looked at, every priority job in error had a run count of 0 showing they've never worked. Adding this would bring that result down to 0 which makes it much easier to then set an alert if that goes above 0

Matthew
I thought about that, but that will mask errors with jobs that should run but haven't. We'll want to see those. New job types as well, for example

John
going to be a hell of a job tidying up all the crap scheduled jobs in the scheduler AND setting up and calibrating monitoring at the same time
My thoughts were to filter those out for now, then look at those crap jobs later

Matthew
Yep, it is, but we can't ignore the mess as it won't go away. A lot of work disabling jobs will be needed to get the overall stats query to show nice figures. We shouldn't shy away from them looking terrible though. I don't believe in fiddling figures to make things look nice when it comes to monitoring
The other queries that show failures and stuck/running jobs for over an hour will help with spotting immediate issues though
One particular situation to take into account is the longest-running job we have is 8 hours. We can take care of that in Datadog with trends and anomaly detection to tell us when more jobs than the expected ones are stuck in a status for longer than an hour.
Similarly, we can use that same alerting method to warn us when the numbers on the overall stats aren't within usual parameters. Change detection is also a good measurement to use here too. We don't necessarily have to use traditional methods of alerting as soon as a value is over X

John
that sounds to me like a case of another metric
count of scheduled jobs running more than 1 hour where job type is NOT “expect long-running jobs”

Performance Problems Part 2

Note: If the min_collection_interval is set to 30, it does not mean that the metric is collected every 30 seconds, but rather that it could be collected as often as every 30 seconds. The collector tries to run the check every 30 seconds but the check might need to wait in line, depending on how many integrations are enabled on the same Agent. Also if the check method takes more than 30 seconds to finish, the Agent skips execution until the next interval.

Datadog

It seems that for custom sql metrics, you can only specify a single time frequency to run ALL the queries. So if one team creates a query they want to run every minute, and another team wants to run every hour – you can’t. 

One team wanted to run a long-running query, but since the first team had set the queries to run every 60 seconds, then this long-running query wasn’t possible.

In a similar fashion, we also anticipate problems if the total time of all queries exceeds this 60 second limit which we will soon do with only several queries.

Another problem that we found is that the time you set is just a guide. So Datadog could actually run it twice in the 60 second period. Then when it comes to creating the dashboard, you have to be careful that you don’t end up counting the data multiple times. Some teams were seeing decimal numbers on their charts when counting data with only whole numbers! 

The possibly crazy workaround
John

I think a good workaround would be to have the data refreshed hourly, and placed in a separate database somewhere then have that separate database queried every 60 seconds by Datadog. If it’s separate to the Live estate – it should reduce the risk. Needs thought putting into how you would pull those stats into a database hourly however. Need a SQL agent job or similar that could collect them once an hour and push them to a central separate location.

John

key thing, would be to ensure we aren’t using the scheduler to kick off SQL that monitors the scheduler 🤣

Christian

Need More Tools?

We purchased Datadog to replace several metric tools. Now we have seen the SQL metrics are a bit basic, it seems like we are concluding we need an alternative tool. I wonder if we will keep buying different metric tools over time and end up in the same situation we were in before!

You get some basic SQL monitoring in the base DataDog install (or the APM one, not sure which). You can pay extra for “enhanced” SQL monitoring in the DBM module. It’s still very basic and about 3x the cost of “proper” SQL monitoring tools. I’m sure as the DBM module matures it will get closer to fit for purpose, but right now it’s an incredibly expensive SQL Server monitoring tool that’s feature poor. If it was near zero cost, I’d have it everywhere to give a wider audience better visibility of what the DB layer does in live, but the features don’t currently justify the cost of that particular extra module. 

Database Administrator

manager: can you draw me a pretty graph

pretty graph:

Mentoring #8: Former Apprentice

Intro

A few years back, I was assigned to mentor one of the Software Developer Apprentices and wrote about him in a series, the last one being Mentoring #7. There, I mentioned that our manager, Colin, was supposed to be setting him challenges (with the aim of sacking him) or finding him some kind of alternative role, possibly as a Software Tester.

The Apprentice turned that idea down, but I thought it would be a good career move if he went for it, because he didn’t seem to have the problem-solving skills required to be a developer. I was increasingly thinking he was one of those people that is “all talk and no action”.

So I’ll go through a few events that’s happened since then.

Colin’s Kanban

I always thought Colin was a bit disorganised, or he’d often come up with ideas then quickly abandon them. When we hired some new developers, Colin created a Kanban board with Tasks that they need to complete for their induction. He said The Apprentice needed to do it as well to ensure we had trained him adequately. The theory was that if the new starters start writing code after completing our training, and The Apprentice doesn’t; then it’s evidence that HR will require to sack him.

After a month, I checked the Kanban board and there was no progress.

Me  15:36
Remember the new starter training programme?

The Apprentice  16:15
What do you mean remember? 
This is my programme, although I'm not exactly working on it like that

Me  16:28
nothing has moved on the board for weeks

The Apprentice  16:30
I don't get your point as we haven't been asked to move anything on the board etc. Maybe it's just for managers to plan etc

Me  16:38
It's a kanban board. It's supposed to be what you are currently doing and what you have left.
I haven't heard a peep out of those new starters

The Apprentice  16:40
I haven't received any such instructions and am doing the tasks I have been asked to do. But I will speak to Colin now that you mention it cos I probably am supposed to be doing that.

So Colin had basically abandoned it, but then there’s no determination to impress from The Apprentice. He is just chilling away without a care. He could have easily provided evidence he had completed everything and impressed Colin.

LibreOffice Config

My Apprentice picked up a bug where he needed to switch the configuration from MS Word to LibreOffice. I told him to configure LibreOffice in Configuration Manager. He asks if it is a feature in the main program. I tell him it isn’t; Configuration Manager is a separate configuration tool.I want him to try and work independently so I need to give him generic advice for the future. To try and work out how to enable features in our software, I tell him that in general you can check:

  1. the independent Configuration Manager tool (newer features are most likely here), 

  2. Organisation Configuration in the main software,

  3. then the modules themselves.

For point 3, one example I gave is that the Users module has its own Configuration screen. 30 mins later he says

I checked User Config and I can’t see an option for LibreOffice“. 

Apprentice

Before I gave him the generic advice, I told him it was in Configuration Manager. Then when I gave him the generic advice, I listed Configuration Manager first. Why didn’t he check them in the order I said? He either doesn’t pay attention or just comes across as trolling by slowly doing the wrong thing.

Oblivious

We had some mandatory Security Training presented remotely from a third-party, which started at 9:00 and lasted half the day. It was 12:45

“Is this Security training something everyone should attend?

Apprentice

The Set Up

When he first joined, I showed him how to check out our code repository, how to build it, where to get the databases from, and we rewrote the New Starter documentation together. He had replaced his laptop recently, so he had to set it up again. He asked me a question about how to access a database backup server, so I asked him why? He mentioned he wanted a particular database from the server. So I asked him “why?” – if he is following the instructions we wrote; it doesn’t say to do that. He claimed he was following the instructions.

“I’m honestly on the instructions, I can’t see what you are referring to.”

Apprentice

The funny thing was, I didn’t have the instructions open but I remember what it said. So I open them, click the Database section in the Table of Contents, then copy the instructions into the chat that say something along the lines of “Run the following SQL script to create the database:”.

What was he looking at? Or why was he pretending the instructions said to access the database backup server? I could have all the databases configured in 20 mins at most, and he dragged it out for hours.

Performance Review

When it was time to do objectives, he obviously didn’t have much to write about because he hadn’t done any work. Apparently, he had a “spreadsheet of evidence” though, so maybe I am wrong. We had a form that we needed to submit and he spent the entire day transferring the spreadsheet to the form. The next day, I had some free time, so I told him I’d help him look into his assigned software bug. He said that he wants an extra 30 minutes to finish the form…which then became a few hours. See what I mean about being “all talk and no action”? He just makes excuses to not do his work.

False Confidence

I ran out of ideas on a bug fix I was working on. I told my colleagues in the group chat on Slack. The Apprentice says

“Fancy a call to talk your thoughts? I’m kind of getting good. And I can share my ideas”

Apprentice

I was completely baffled where this confidence was coming from. He hasn’t fixed anything himself and struggles to come up with ideas. I am not opposed to a Junior correcting/inspiring me, but there’s no evidence to suggest that he could do it.

Support

Last month, he told me he has a new job, but is actually staying within the company. He has switched to 2nd Line Support. I don’t really get how that interview went. Being a software developer is about diagnosing then fixing issues, whereas support is just about diagnosis. (If there is a known fix that they can do without assistance from development, then they can fix the problem too). So I think it makes sense that people move from Support into Development if they have learned how to code, but I have never seen the switch the other way. I am intrigued how that is going to go. He has already started making claims like “this is much more suited to my skills”, “I’m really happy with this role”, but it’s early days.

Mum’s email problem: The Train Noise

Recently, Microsoft made a change to their OneDrive terms. I’m sure it was a bug, but if your OneDrive becomes full, they then stop your Outlook emails from being received. I was receiving the warning when my OneDrive was 99% full but my email allowance was 3% full.

My Mum had set up OneDrive to sync her Photos and Desktop, and had dumped several GB of videos there when she only had a 5GB OneDrive limit.

I had told her to sort her files out. However, she isn’t even confident dragging and dropping files into different folders.

It’s always tough to explain problems to her, or for her to explain problems to me. She says she normally checks emails on her phone these days, rather than on her laptop, and she doesn’t use OneDrive on her phone – so she couldn’t make sense of my explanation. Her thought process was: “How could Onedrive on her laptop prevent Outlook from receiving emails on her phone?

Then, when she kept on showing me the Gmail app, I told her that I couldn’t fix it in Gmail. Then she kept on saying I’d got it wrong, because it was Outlook and not Gmail that she was using. 

She uses an Outlook email account, but the Gmail app on her phone. She couldn’t seem to differentiate between the concept of an account and app with the same names.

It’s easy to take aspects like this for granted. What’s easy for me to understand is simply impossible for the non-technical person. In the age where pretty much everything is pushing for digital, it’s a big ask for the older generation to come on board without every step of the process being intuitive.

When I got to her house, she seemed adamant in clearing out old emails, and I kept on telling her that emails are small and it’s not the problem.

She also complained about a “train noise” which I said I would need to hear from myself because it’s an extremely weird statement.

Once I had sorted out the files on her laptop, and stopped OneDrive syncing files on her Desktop, I told her that the emails should come through. She claimed that it still wasn’t working, and was showing me by refreshing the emails by sliding her finger down.

“I get double emails”

Mum

“I thought you said you weren’t getting emails?”

Me

“no, that was before when I was getting the train noise”

Mum

I sent her an email, and it came through along with a notification sound.

“There it is! the train noise!”.

Mum

She was adamant it was never her notification sound, and to be fair, there was a different tone for text messages, so I don’t know if it was a Gmail-specific tone, but I couldn’t see an option in the settings.

Even when emails were coming through she still claimed they weren’t coming through. She said that she often gets around 20 emails a day but had only 4 come through; so “it wasn’t fixed”. I didn’t know how to prove how many she should have.

But what about the “double email” problem? I still needed to solve that before I left. So I asked if she knew how we could recreate it. She refreshed Gmail and pointed to the loading wheel. After a few seconds, another loading wheel appeared lower down and she said: “there! double emails“. So there wasn’t double emails – just double the loading wheels.

I asked why this was a problem to her, and she said she

“didn’t get double emails or the train noise until the Microsoft thing popped up”.

Mum

Who knows if it’s true or not, but it’s so hard to help her when she describes somewhat fictional problems using the wrong terminology.

For another Mum-story, see Mum’s Frozen Laptop Screen

Managers visiting India

In recent times, our HR Director reiterated that for UK workers, there are no plans to return to the office and we will continue to work at home. However, our Indian workforce will be. It sounded like it was some government-mandated thing.

I suppose it is great news for managers and directors because they love making any excuse to go over there for a week for “work” then post about the sights and local cuisine.

Days after the HR Director had spoken about how “home-working was the way forward for the company”:

“Caroline and I had a long chat when we were back at the hotel and talked about what we had learnt so far this week. We both concluded we need to have more fun at work, see people face-to-face more often and continue to have new experiences as that helps with personal growth.”

HR Director

How does she not realise the hypocrisy of her statement? It wouldn’t surprise me if she u-turned and told us to go back to the office.

Meanwhile the CTO finally realises developers are actually important (whilst sampling the local cuisine, of course):

“One of the highlights of my trip was getting to know the team on a more personal level, through lunches, dinners, and working sessions. I have come away from the trip with a newfound appreciation for the vital role that our developers play in our company’s success, and for the amazing work that they do every day.”

CTO

How can you be one of the leads for the Development department and not realise that software developers are the key part of a company that sells software? 

I’m sure they mean well, but the more you think about it, the worse it seems. It also seems like he appreciates the Indian workforce more than the English ones.

Colin: How was their visit?
Jeeva: "They are very busy, we got 6 minutes of their time”.

Why so specific? I wonder if that is a cultural thing. Indian’s seem to do it with job experience. Us English just round up or down to the nearest half-year, but they like saying they have “2.2 years of C# experience”.

The Expo

Despite other departments in our company arranging optional meetings to explain what they do, in a recent survey, colleagues had apparently said they don’t have visibility of the wider business. The directors decided to create what they called “The Expo”. This was a mandatory all-day event located about an hour away from our main office.

The Registration

We got told to sign up via an Events Company’s website and the instructions we were given didn’t match my experience at all. We were told by visiting the website via the link in the email, it would auto-populate with our names. Yet the first question on the page was to ask for my name and email, and only then did it display back to me. We were also told to ignore the displayed cost because it would be the company that would be charged and not us personally – yet it displayed as “free”. There was no information in the email about how we would travel, but then one of the questions on the survey was asking us how we would arrive at the venue. It was only when I tried the “coach” option did it then state it would pick us up from the office on the day, but didn’t state the time. I then expected that information in the confirmation email, but it was only a week before the event I was told the pickup was 7:50am which meant I would probably have to wake up at 6:30 to make sure I didn’t miss it.

Despite the usual reminders that travelling by personal car on company time would invalidate your insurance (since you need to be covered by business insurance), I suspect most people went by car because there were only around 20 of us on the coach.

The Arrival

As we headed towards the venue, some of the senior management handed out notebooks but no pen. The notepad did have the schedule printed on the front which was useful. Since they told us to dress smart, most of us didn’t bring bags, so we had to inconveniently carry a notebook around all day.

The Event

Most of the day was sat in a large meeting hall watching presentations from the heads of various departments and directors. They used the opportunity to present the Employee of the Year awards. The thing is, we periodically watch presentations like this remotely, so it seemed a waste to make people from across the country to travel to an event like this. Some people travelled hours, and stayed in a hotel on company expense, and I just didn’t see how it could possibly be worth it.

“can’t believe we have so many people turning up”

Event organiser

You invited the entire company and said it was mandatory, what did you expect?

The Call

As it approached lunchtime, during one of the presentations, I noticed a developer checking his phone and looking concerned and then he walked out. During lunch, I was talking to another developer in the team. He started looking at his phone, then handed me his plate and told me to hold it while he goes to take a call. Several minutes passed and he hadn’t come back, so I was there awkwardly holding 2 plates. There were no tables and the room was packed with people so there was nowhere to put it. Eventually, I just had to bin it. When I saw them later in the day, they said there was a Major Incident so were called to a room to investigate remotely. So a few developers and the directors were all gathered around 1 laptop, knowing there wasn’t much we could do.

That’s another disadvantage of one of these all-day events. No one is working, and urgent issues like this can’t be addressed and just adds extra stress.

The Stalls

Another part of the day was walking around some stalls. It was like a careers fair but you already work for the company that they are promoting. Also, since there were that many of us, we were split into 2 groups, and since I was in the second batch, all the free stuff was gone. I could have grabbed a free pen for the notepad.

I didn’t really get the motivation for actually approaching anyone on the stall to listen to them reel off a speech that they would have been saying on repeat hundreds of times throughout the few allocated hours. Wouldn’t it have been better if these were also presentations in the main hall, then they only have to do the presentation once, and everyone gets to hear it?

I suppose the more socially outgoing types might be in their element, but I was just walking in circles, pretending to look busy. Eventually, I got talking to some old team mates and we were commenting on how bizarre the entire thing was.

Software Development Stall

For Development, the manager who volunteered to run our stall said he planned on using a darts game but using “magnetic” darts so it wouldn’t violate Health and Safety. The “Fun Police” busted him though – Health and Safety deemed it unsafe since throwing something would involve swinging your arm back and that is dangerous.

I’m not sure if things have changed from a few years ago, but we have had office parties with a Bucking Bronco, Bouncy Castles and similar games. I think every year 1 person has had some kind of injury using those, but it never stops us organising things the following year.

The replacement game was a “guessing game”. It was a vague question like:

“Guess how many minutes were saved in a 2 month period during our “COVID” release?”

The manager was insistent that everyone should guess, and was demanding I submit a number. I didn’t understand how people were coming up with the numbers. I kept on asking questions but I got the impression even the figure they had was some exaggerated figure. I asked

  1. what was the feature?
  2. how was it measured?
  3. Why are we measuring in minutes when the figure appeared to be large like 20 mill minutes.

After the event they announced that guesses ranged from 100 minutes to 2 billion minutes! The correct answer was 31.9 million minutes.

Evening Meal

Another part that wasn’t well thought out and a waste of money, is that they provided food at 4:30 but there was nothing scheduled afterwards. The coach was at 5:30 so I had to stay, but most people had no incentive to stay. So when food was served, there were probably only 20% of the people there, but they would have paid for food for everyone.

Feedback

I did wonder if people would give negative feedback for the event. It seemed rather pointless to me; a complete waste of money, and not really that feasible to try and bring the entire company in one building like that. I suppose the main COVID days are behind us, but it’s probably a bit of a health risk. I knew a few people that were ill after the event.

Loads of people actually stated positive feedback, but it was on Yammer, and people seem to use it to post overly positive statements. The only negatives were:

  1. All I used the notebook for was the agenda on the back, which could have been printed on the back of the pass.
  2. no pen
  3. carrying around the notebook all day
  4. restricted interactions, 
  5. lack of space while eating

Yet the positivity was flowing:

It was great to put faces to names – especially when it’s a team that you work closely with. It was also really nice to see people that I’ve not seen in over 3 years. Overall, a great opportunity.

I was reminded of the value of face to face interactions with colleagues across the business. Will be looking forward to future opportunities to meet in person.

I learnt more in one day about our business/products and services than I have in the year that I’ve been at the company. What an amazing ‘sheep dip’ opportunity to immerse oneself in the ‘what we do’ and ‘why we do it’ for our end Customers – fantastic !

Seeing people, I have not seen in over 3 years since working from home was great, as was meeting people I have never met in person before. Getting to meet them in the flesh and build further bonds was a great experience.

Having the opportunity to network on such a mass scale and learn about other departments that we may not have regular touch points with was invaluable.

It helped me appreciate how each and every department is working towards the same goal, and that we have a lot of exciting tech-for-good products!

Being just 5 months into starting with The Company I found The Expo a real success. Not only did it serve to educate as to the breadth of offering across the wider business but also reinforced the value and absolute quality of our portfolio. The insights to the wider business has reinforced key messages around data quality relevant to my customers within their world and how our unique offering can support and accelerate the valuable research they provide. This event really encouraged me to reach out across the wider business and was indeed an excellent platform for building an internal network. The Expo gave a real feeling that top down and bottom up you would get support.

Sometimes, being bogged down in the day job, it’s common to be disheartened by the day-to-day challenges we face and getting mired in the weeds: a classic case of not being able to see the forest for the trees. The Expo was a great opportunity to get the big-picture view and see how, by succeeding in the daily challenges, we are improving healthcare for huge numbers of people.

I’d been fortunate to have attended two other great opportunities to meet up face to face and network already this year, and still found this another great one to see even more people and catch up on common themes across the whole business. I always learn something both personally and about our products, on this occasion improving my knowledge of Project X particularly. The overwhelming reminder at the beginning of why we do it, the impact it has and why we should feel proud of where we work, is a welcome prompt to look up from the day to day challenges more often.

I met many more people face-to-face for the first time than I’m able to remember now and got a real insight from generally chatting to others around what different teams actually do, how and why. With the number of acquisitions we’ve done over the years it also helped me learn what some of the products we’ve acquired are as well as how they add value to our customers. It was great to see some of our Chennai colleagues at the event and hear how the services I work on day-to-day are used in the real world.

Hang on, they flew people from India over to England just for the event!? What a waste.