Goodbye Slack

For the last several years, we have used Slack as our primary way of communicating in the Development department. However, company-wide we have Microsoft Office 365 licences, so other departments use Teams. I always thought it was a dumb decision to use Slack due to essentially paying twice for a communication tool. Slack isn’t that expensive on the lower tiers but it adds up when you have a large amount of staff. Plus, due to stricter security policies, we wanted to use single-sign on so had to upgrade to the Business+ licence which didn’t seem to be worth the cost.

As time goes on, we keep “improving security” which I often think is just an excuse to get rid of certain software. How do you really determine which software or companies are secure anyway? They could tell you they use certain security practices or have some accreditation but if your data is exposed in a data breach is another story.

“not sure what you can hack via Slack. Just over reacting like everything these days. 2FA all the things!”

me

On Slack’s Enterprise licence, they boast even more security features and with our new strict security policies, the management decided that we would have to pay significantly more to keep using Slack, or just get rid of it. So they decided to get rid of it.

To be fair, Teams has improved a bit over the years, and although I prefer the way Slack looks, and its excellent emoji support (you can add custom emojis!); I can’t justify the cost.

why is slack not secure as opposed to teams? probably just nonsense. Where does the data go when it is lost? surely doesn’t leak out onto the dark web!

Rob

We somehow had over 900 members according to Slack Analytics but I reckon that was every historic user since we started using it. Scrolling down the list and roughly estimating, we seemed to have around 300 which could reasonably be called “active”. Then looking at the Business+ costing, it should cost $45,000 per year. Enterprise is one of those tiers where it says “contact sales for a quote”. One manager reckoned it would cost $250k a year to use which doesn’t sound right. How can you justify such an expense for a chat application? Even if it did cost that much on paper, surely you can haggle that down significantly. I’m sure Slack won’t want to lose us. Surely charging $60k is good profit for them.

I often think the way companies charge for software licences doesn’t make sense. They often just charge “per user per month” but there will be times where people don’t actively use the licence due to the work they are doing, or maybe have annual leave to take. Then there’s people that join temporarily, then people just naturally join/leave the business over time. So who really tracks what the accurate amount you need to pay. Companies just end up overpaying for licences they don’t need. Slack seem to suggest they charge just for active users. But what happens if you just send a few messages for 1 day in the month; is that an active user for the month? I often think the best approach would be to charge for a certain amount of users, but then give out an extra 25% keys for light usage.

One thing that I found interesting when looking at Slack Analytics is that most people seemed to be sending as little as 20 messages per day. I think that they are either super focussed and just work independently, or they are chilling out. It’s hard to believe that you can work well in a team, or even have a good relationship with them if you only send 20 messages. I find that some people use instant messaging by sending a sentence per message, which will inflate the message count which makes the numbers even more surprising. For example, they could send 4 messages for this interaction:

Hi

Are you free?

I was wondering if you can help me work out this error

I have just got the latest code but am unable to log in

The decision to remove Slack was disappointing for some, but the bizarre thing is that we got told by our manager on the Wednesday, it was formally announced on Thursday, and gone by Friday 4pm. If you were on annual leave that week, you would be confused when you could no longer access Slack on the following Monday. There was some great information that we had on there, and was great to search for common errors and find solutions to them. We didn’t have enough warning to try and extract the information.

“Has the cost of the loss of productivity and collaboration been factored into the decision to remove slack?”

Sad developer

One developer had a crazy idea of  developing our own solution:

“We are a software development company. If we’re that desperate, can’t we write our own messaging system, protected to the security standard we want?”

Ambitious developer

The thing is, we already made a chat application for our users. I never understood why users would want a native chat app when they could use something more widespread. Since we already have a chat app, then it could actually make sense to add more features to it; then use it internally.

Making your own tools isn’t as cheap as you would think. If a developer’s wage is £35k, then paying only 1 developer to develop and maintain it each year is £35k. You may as well just pay for Slack then. But if we are using it and selling it to our users, then it does make more sense.

The weird thing is, for our upcoming software, we originally used Okta for the login functionality but it was decided it was too expensive, so a few developers got together and made their own solution. That seems bonkers to me because that is about security, so surely you should leave it up to the company that specialises In security. So the fact that we do make custom authentication makes the idea of making a chat app even more realistic.

However one of the architects working on this upcoming software ironically replied:

“We need to move away from homegrown solutions, especially based on the presentation happening now from our Head of Security”

Hypocritical software architect

Another architect supported this claim:

“This is about minimising home grown solutions when an off-the-shelf solution would do just as well”

Software Architect

Does that mean he should be bringing Okta back?

The Outage Part 2: Feedback on the new process

In my blog, The Outage, I described a Major Incident and a knee jerk response from the CTO.

He described this situation as a

“major incident that impacted the whole estate, attributed directly to a failed Change. We recognise that the change was not intended to have the adverse impact that it did, but sadly the consequences have been a major blow to Users and us. Therefore, we are seeking to create immediate stability across our estate, and are implementing several amendments to the way Technology Changes are approved and implemented”

CTO

He came up with 5 changes that he came up with, presumably with no consultation from others. I gave my view on them in the blog. After a few months of carnage, the CTO has put out some revisions to the process.

CTO = Chief Technology Officer

SLT = Senior Leadership Team.

ELT = Executive leadership team

BAU = Business as usual

Suggestion from CTOMy View at the timeCTO’s update
“There will be a comprehensive change freeze for the month of June, with only changes meeting enhanced criteria being passed for implementation.”The size of the release wasn’t the problem, so cutting it down won’t solve anything. It might annoy the users even more if we then delay features that we announced.“as a knock-on effect, we have also reduced our delivery capacity and timescales.”
 “Pre-approved changes are suspended”The idea of a “pre-approved” change is that it is something that is often run on the live servers to fix common issues and is low risk, hence it is pre-approved (eg the ability to restart a crashed server/service.). This is just going to annoy staff members in Deployment. The CTO also remarks:  “Preapproved changes are wonderful. They have been reviewed and tested to death. My goal is to increase the number of preapproved changes in the future. It’s just with the existing ones, we don’t know if they have been reviewed or not”.  You don’t know if they have been “reviewed” but they have been run 100’s of times, and never caused an issue. So you are temporarily banning them on the grounds that they could cause an issue?“The door for pre-approved Standard Change has been re-opened. Standard Change templates can be proposed and put forward as before. As part of our continued governance and enhanced view of change taking place, we do ask for the following:   Each Standard Change template requires approval from one SLT or ELT member. A full review of both the implementation and rollback steps needs to have been undertaken.”
“Any changes submitted for approval will require TWO members of SLT. ”How many times has there been some kind of approval process and the people with authorisation are too busy or on annual leave? Why are we going from 0 approvers to 2? Would the managers understand a change to enable a feature for users belonging to company A, B and C? Would they go “hang on, C don’t have the main feature! I’m rejecting this”? It’s going to be a box-ticking exercise.  We already have a problem when changes are Code Reviewed by Developers – there’s not enough “expert” people that can review it in the required level of detail. So how would a manager understand the change and technical impact? It will be more like “does this make us money? Yes we like money”; approved.“A significant challenge impacting time to deliver has been the ‘two eyes on’ stipulation. We recognise that not every type of Change requires two sets of eyes and so are refining this slightly.   Standard Changes will need to follow the above process. Where ‘two eyes on’ is not deemed necessary, two SLT approvers will need including in the template submission verifying that this is not required. Normal Changes will follow the BAU process. Where ‘two eyes on’ is not deemed necessary, two SLT approvers will need including in the submission verifying that this is not required.”
“Implementation activity must be witnessed by two or more staff members. Screen sharing technology should be used to witness the change. No additional activities are carried out that are not explicitly in the documentation.”This might actually help, although might be patronising for Deployment. The CTO made a comment on the call about having “Competent” people involved in the deployment process. So if a Developer has to watch a member of Deployment click a few buttons; it feels like babysitting and not respecting them as employees.no specific comment was made
“All changes must have a comprehensive rollback plan, with proof of testing. The rollback plan must be executable within 50% of the approved change window.”The rollback idea is one of these ideas that sounds logical and great in theory but this is the biggest concern for the technical people in Development.no specific comment was made

So in conclusion, it seems I was correct.

This is very concerning to hear

On a code review, a Senior Developer, Lee questioned why there was no database changes when the Developer Neil had removed all the related C# server code. Neil replied that he “wasn’t sure how the patching process worked” (despite being here years, and was in a team with experienced developers), and wasn’t sure if there were any backwards compatibility issues to consider.

So what was his plan? just hope it gets past the code review stage unchallenged? Then we would have some obsolete stored procedures, and unused data lingering in the database for years?

I initially thought his claim for backwards compatibility issues was nonsensical but from an architectural standpoint, it makes sense due to how it works in our system. The server code doesn’t call the other’s server; it goes direct. So that means if the old version calls the new version, then it would expect the stored procedures and data to exist. However, for this particular feature there were no cross-database calls at all.

I suppose being cautious and not deleting the data makes sense from a rollback point of view. It’s hard to restore the data if it is lost, but easy to restore the C# code. I have never seen us use this approach though.

The Senior Developer said :

This is very concerning to hear, can you please work with your team lead to understand how our versions are deployed, and if they are unable to answer all the questions, please reach out to someone. We do not support any version changes by default, though there are cases where we do have cross version server/database calls, but these are for specific cross organisation activities.
You can safely remove these columns, update these stored procedures.
There is no value in leaving something half in the system, if it is no longer needed, remove all references, database rows/columns/tables, class Properties, etc.

In my previous blog, I discussed Project vs Domain Teams. This is kinda linked in the sense that specialising in a certain area of the system means you gain knowledge of the functionality and architecture of that area. There would be less chance of this scenario happening where the developer is questioning if there could be backwards compatibility issues. However, he could have also found this information out by raising questions.

This example does cover many topics I have discussed on this blog:

  • Poor communication
  • Bad decisions
  • Funny quote from a senior developer ”This is very concerning to hear”

Domain Teams, Project Teams & Cross-Cutting

In the world of Software Development, there are often differing views on how to arrange teams. Regardless of the approach, people will leave/join over time, but team members need to be replaced and teams need to adapt.

There was a time when we were arranged into teams that were assigned to a Project, then moved onto a completely different one once complete. Any bugs introduced by the projects then get assigned to a “Service Improvement” team who only deal with bugs (and possibly ad-hoc user requests).

Then after a few years, and maybe under a new Development manager, they would restructure to Domain teams where you take ownership of a group of features and only projects related to those would be assigned to your team. Any bugs introduced by the projects stay with the team, which gives you greater incentive to fix them early as possible. People build up knowledge of their areas and become experts.

Then a few years later, we will switch back to Project teams.

There’s pros and cons to each structure, and there’s always edge cases which pose a management problem. Even in a Domain Team, there will be certain features that don’t neatly fit into the groups you defined, or ones that apply to many modules eg Printing.

Sometimes we have called a team that handles the miscellaneous features “Cross-Cutting”. Managers would sell it on being for features like Printing that really are used by many areas of the system, but we all know it becomes a team that gets miscellaneous and unrelated projects. They end up being like the “Service Improvement” team that deals with random bugs, and work no one else wants to do.

Cross-Cutting

There was a meeting where managers were announcing the new Domain Teams and I got assigned to Cross-Cutting. One developer was voicing his concerns about having a Cross-Cutting team. He wanted to point out that Domain Teams are supposed to have specialist knowledge on the Domains but most people that were assigned to their teams had little-to-no knowledge. For some reason he chose my name to make a point.

“What does TimeInInts know about Cross-Cutting?”

Which received a room full of laughter. I’m sure some were laughing at his point, some laughed at his emphasis and delivery, and others probably saw it as an attack on my knowledge. I was probably one of the best people for it really, given my experience in the previous Service Improvement teams.

The whole idea of keeping Domain knowledge in the team only works if there is a true commitment to keep the teams stable over years. However, people will leave the business, some will want to move to a different project to broaden their skills, or people could just fall out with their team members.

Another concern this developer had was with his own team. He was assigned to a Domain team he was the expert on, but was used to working with a couple of developers in the UK. This new team had two Indian developers. They had recently acknowledged the distributed teams weren’t really working so these new Domain teams were supposed to be co-located. But this setup seemed to signal that he was there merely to train these Indians up to then essentially offshore the Domain. Since he was the expert and proud of it, he still wanted to work in that area. But he can’t work on the same software forever.

In the Cross-Cutting team, we had an open slot labelled “new starter” so we were going to get a new hire in. You have to start somewhere, but again, this doesn’t help the teams specialise if they don’t already start with the knowledge.

Colleagues Opinions:

Developer 1:

Me 13:39: what does a new starter know about Cross-Cutting? 
Mark 13:39: sounds more like Cost Cutting! 

Developer 2:

It’s infinitely harder to build something if you don’t understand the thing you’re building. Hard to catch issues and make sense of designs if you had no opportunity to learn the domain.

Developer 3:

isn’t one of our major issues is we’ve lost domain expertise for core/bread and butter modules.  For any “module”, there’s a combination of what the requirements are/how it should work, and what the code is actually doing. Without “domain teams”/ownership – we’ve lost a large part of the puzzle (how module should work).

Developer 4:

our teams are completely ineffective, expertise has been spread too thin. We probably need to reorganise the department again with who is remaining.

Build stronger teams first that only have one junior-ish person, then have weaker teams helping out where possible. It will be very hard for the weaker teams, but unless we do this, we’ll lose the stronger people.

The weaker teams should be given appropriate projects with longer timescales, and given as much help as possible while ultimately having to struggle their own way, stronger people who put in the effort will begin to emerge from those teams.

Balance in Teamfight Tactics

I’ve read about, or watched videos on computer game balance and find it such an interesting topic. How you can measure and perceive the strength of each character/unit, or attempt to fix the issue to rebalance the game.

Second Wind have made a video on Teamflight Tactics.

I’ve never played this game, or even similar games, but it has the same general problems to solve in its design that many games do.

So taking the transcript, and running it through AI, I’ve made a good blog on it.

Teamfight Tactics

Teamfight Tactics (TFT) by Riot Games is a strategic auto-battler, inspired by the League of Legends universe and drawing elements from Dota Auto Chess. In this competitive online game, players are pitted against seven adversaries, each vying to construct a dominant team that outlasts the rest.

In a game like League of Legends, a single overpowered champion can only be selected by one player and would be banned in competitions once discovered. In TFT, all Champions and items are available all at once creating many possibilities for players to find exploits in.

Balancing the dynamic of Teamfight Tactics (TFT) is a compelling challenge. Comparing it to card games like Hearthstone, where adjustments are made through a limited set of variables, TFT presents a stark contrast with its myriad of factors such as health, armour, animation speed to name a few.

Initially, it might seem that having numerous variables at one’s disposal would simplify the balancing process. Even minor adjustments can significantly influence the game’s equilibrium. For instance, a mere 0.25-second reduction in a character’s animation speed can transform an underperforming champion into an overwhelmingly dominant force.

The sensitivity of each variable is due to the intricate interconnections within the game. A single element that is either too weak or too strong, regardless of potential counters, can trigger a cascade of effects that alter the entire gameplay experience.

Consider the analogy of a card game where an overpowered card exists. In such a scenario, there are usually counters or alternative strategies to mitigate its impact. However, if a card is deemed too weak, it’s simply excluded from a player’s deck without much consequence. Contrast this with a game like Teamfight Tactics, where the strength of a champion is intrinsically linked to its traits and the overall synergy within a team composition. If a champion is underpowered, it doesn’t just affect the viability of that single unit; it extends to the entire trait group, potentially diminishing the strength of related champions. This interconnectedness presents a challenging balance but manageable through data analysis. Player perceptions of balance are shaped by this data.

Vladimir The Placebo, and Vain the Unappreciated

The character Vladimir in League of Legends had become notably powerful, overshadowing others in the game’s “meta”. To address this, developers proposed minor tweaks to balance his abilities. However, when the update was released, Vladimir’s dedicated players were outraged, believing their favourite character had been weakened to the point of being nonviable. But, in an unexpected turn of events, the nerf was never actually implemented due to an oversight. The players’ reactions were solely based on the anticipated changes they read about, not on any real modification to Vladimir’s capabilities. This psychological effect influenced Vladimir users to play more cautiously, while their opponents became more bold, illustrating how perception can shape reality.

Data only reflects the current state, not the potential. Particularly in a strategy game like Team Fight Tactics, which is complex and “unsolved”, players’ understanding and use of characters can be heavily swayed by their perceptions. Perception often becomes the player’s reality. 

In the fifth instalment of the game, there emerged a low-cost champion named Vain. Initially, after the game’s release, the consensus was that Vain was underperforming—deemed the least desirable among her tier. The development team had reservations; they believed she wasn’t as ineffective as portrayed. Consequently, a minor enhancement was scheduled for Vain. However, before the update could go live, feedback from players in China indicated they had discovered a potent strategy for Vain. This revelation transformed her status drastically within three days, elevating her from the least favoured to potentially one of the most overpowering champions ever introduced.

This scenario underscores the limitations of relying solely on data, whether from players or developers, as it may not reveal the full picture. Balancing in gaming is often perceived in black and white terms by the player base—they view a character as either strong or weak, which leads to calls for nerfs or buffs. However, they frequently overlook the subtle intricacies and minute adjustments that can have significant impacts on gameplay.

Different Players

In competitive games like League of Legends, different balance parameters are set for various levels of play. A character might dominate in lower ranks but may not be as effective in higher tiers of play. 

When it comes to balancing games like Teamfight Tactics, developers have taken an approach by balancing the game as if computers were playing it. The game is designed to test strategic thinking rather than reflexes and mechanical skill.

In Army A versus Army B, the outcome is predetermined. However, this does not mean we should nerf an army simply because it performs well at a lower skill level. Instead, it presents a learning opportunity for players to improve their skills.

Interestingly, perceived imbalances can serve as educational tools. As players engage with the game, they gain knowledge through experimentation. For example, if a player tries a certain composition with specific items and it fails, they can reflect on whether it was a misstep or an unforeseen event. Learning that a champion doesn’t synergize well with a particular item is valuable knowledge to carry into future games.

There are build combinations that could potentially disrupt the game’s balance if the perfect mix is achieved. This aspect works well in single-player modes like Roguelikes, where the aim is to become overwhelmingly powerful. However, the challenge arises in maintaining this sense of excitement while ensuring these powerful builds don’t lead to exploitation in a multiplayer setting. 

Risks & Rewards

Balancing isn’t merely about pitting one army against another to see the outcome. It’s also about the risks involved in reaching that point. For instance, if there’s a build that appears once in every 10,000 games, requiring a perfect alignment of circumstances, it’s only fair that such a build is more potent than one that’s easily attainable in every game. Therefore, in games like TFT, balancing involves weighing the relative power against the rarity of acquisition, ensuring that when a player encounters a significantly rare build, it feels justified because of the risks taken or the innovative strategies employed.

TFT thrives on the abundance of possible outcomes, with a multitude of combinations and variables at play. It’s crucial for these games to offer not just a handful of ‘high roll’ moments but a wide array, potentially hundreds, allowing for diverse gameplay experiences. TFT reaches its pinnacle when players are presented with numerous potential strategies and must adapt their approach based on the augments, items, and champions they encounter in a given game, crafting their path to victory with the resources at hand.

New Content Updates

The allure of both playing and developing this game lies in its inherent unpredictability. Each session is a unique experience, a stark contrast to many Roguelike games that, despite their initial promise of variety, tend to become predictable after extensive play. Teamfight Tactics, however, stands out with its vast array of possible combinations. Just when you think you’ve seen it all, a new set is introduced, refreshing the game entirely. This happens every four months, an impressive feat that adds a fresh roster of champions, traits, and augments.

The question arises: how is it possible to introduce such a significant amount of content regularly while maintaining balance and preventing the randomness from skewing too far towards being either underwhelming or overpowering? The answer lies in ‘Randomness Distribution Systems’. These systems are designed to control the frequency and type of experiences players encounter. As a game designer, the instinct might be to embrace randomness in its purest form, but the key is to harness it. By setting minimum and maximum thresholds for experiences, we ensure that all elements of randomness fall within these bounds, creating a balanced and engaging game environment.

In Mario Party, have you ever noticed that you never seem to roll the same number on the dice four times consecutively? This isn’t a coincidence; it’s actually by design. Nintendo has implemented a system of controlled randomness to prevent such repetition, as it could lead to a frustrating gaming experience.

This concept is akin to a crafted ‘Ludo-narrative’, where game designers aim to shape player experiences through seemingly random events, but with a controlled distribution to keep the gameplay enjoyable and engaging. The goal is to allow players to encounter extreme situations, but these are skewed towards positive outcomes rather than negative ones.

This scenario might distort the essence of randomness, but surprisingly, players may not voice their dissatisfaction. Despite the statistical improbability, with millions of players engaging in a game daily, someone is bound to encounter this experience. Even odds as low as 1 in 10,000 can impact thousands of players at scale, highlighting the importance of considering player frustration as a crucial aspect of the gaming experience.

Perfectly Balanced

When discussing game balance, it’s not just about whether a feature is frustrating; it’s about recognising that frustration indicates a flaw in the design that needs to be addressed and learned from. Game balance is a complex, ever-evolving challenge that developers continuously tweak, hoping to align with player expectations. However, there will always be criticism, no matter the adjustments made.

The perception of balance is significant, and within any gaming community, you’ll find voices claiming that perfectly balanced video games don’t exist. Some players set such lofty standards for balance that they seem nearly impossible to meet. The key is establishing a solid foundation that dictates how the game should unfold, ensuring that the core gameplay aligns with the intended player experience.

In Teamfight Tactics, the ideal duration for rounds is targeted to be between 18 and 25 seconds, which is considered the standard for a well-paced battle. By setting these benchmarks, developers can align the game’s balance with this envisioned state and is key to achieving a finely-tuned game.

Conclusion

It’s essential to have a clear, balanced vision for the game and to persistently follow through with it. Balancing a game is a complex and dynamic challenge, not merely a matter of adjusting to data but also managing player perceptions and their experiences of frustration. Navigating this ever-changing landscape is no easy feat, especially when the development team must juggle multiple roles at a rapid pace. However, it’s precisely this complexity that adds to the excitement and enjoyment of Teamfight Tactics.

Project Aurora & The Strangler pattern

Recently we have had another tech guy join the company who is reporting to the CTO. I find that people in these kind of roles want to put their stamp on things by coming up with a new idea.

He presented his idea in our monthly Tech Meeting. He wants to attempt to address our performance problems by taking traffic away from our main on-premise databases. There’s been some similar ideas recently, and although I’m not great when it comes to hardware, networks and general software/hardware architecture; I am sceptical that these ideas can work.

His idea is that we can replicate the database in the cloud (“the cloud” solves all problems you see), and then the database in the cloud can be used for Read access, whereas Write would still go to the main on-premise databases (then synced up to the cloud).

The Announcement

This programme of work is to move workload away from our primary systems to enable these systems to withstand expected load factors from upcoming initiatives as well as expected growth in usage on our APIs during Winter 2023.

The intent is to run focused cross functional teams in work-streams across the group to deliver this initiative. The approach taken here is to place multiple bets, across multiple teams. The expectation is that not all teams will deliver by September, but enough to bring in the headroom we need.

The programme is intending to free up at least 20% load across our core databases.

Upcoming aims:
• Strategic, move read-only workloads to Aurora.
• Redeploy APIs to AWS, Move to cloud technology, Containerise and Optimise Service
• Enable use of replica data when ready.
• Move Appointment Workload
• Mitigate 8am peak load.
• Use caching engine on AWS (Elasticache/Redis), mitigate 8.2% of PC DB Load 
• Reduce load on the DB during day time.
• Reduce Datacentre and DB load and improve performance
• Mitigate 6.2% of DB load by optimising how we summarise task counts
• Proof of concept is Complete, expected to cost £2m a year.

My Conversation With Architect Mark

I think the reason for the replication (as opposed to just moving it all to the Cloud) is that you can’t fully commit to ideas like this. You have to have a rollback plan. So if we find it doesn’t work, is too expensive etc., we can just return to the old way without much inconvenience. I asked one of our Software Architects what he thought of the plan because it doesn’t sound right to me:

Me
doesn't sending data out to another database just increase traffic, and they wanted to reduce it?
Mark
Yes, it will also be delayed, and often broken
Me
no pain, no gain
Mark
they're replicating data, and it's unlikely it'll be used
Me
I don't see how you migrate things. You have to keep them both running until you are confident it works, then bin off the old database. But then in reality you just end up keeping them both for longer than expected
Mark
you then also need cross-database transactions or to be very careful with queries
yeah, that's basically it. Have the same API at both ends, some sort of replicate and transform on the data to ensure it's in both. Persist to both simultaneously, then when all works, turn off the old
Me
The CTO said that “some people say there is a delay, but it is only 5 minutes”. Does that address any of your concerns at all?
Mark
no, this is only the second time I've heard about this, and the first I laughed
I agree with the principle of strangler pattern for migrating, but this isn't migrating
it's keeping multiple DBs 'in-sync'
Me
does that mean you can view an appointment book which is 5 mins out of date, and you try book an appointment, then it checks the real database and is like "no mate you cannot do that"

The conversation between architects

Mark then sent me a conversation he had with two other architects, Andrew and Jon. Mark already had concerns with the “appointment book” example.

Mark
so when this replication system goes down for a few hours, what happens then? I guess the system tries to book appointments for slots already booked, put in requests for items already issued etc.?
seems our business layer needs to be aware of how outdated the original information was, so it can compare something like a changelog number. Sounds like a big challenge to implement correctly

Andrew 11:10
Yes, any write operations will need logic to ensure that cannot happen Mark.
John and I have already called out that Appointments and Orders will have significant challenges with this replication model and have suggested that the initial focus should be on User Profiles, and any historic data, etc.

Mark 11:13
User Profiles and historic data seem just as dangerous to be honest.

Jon 11:15
The idea I suggested these is that you would check the change log on the primary system before even considering going to the replica. If the User had had a recent change (what counts as "recent" is TBC, I suggested 30 minutes) you wouldn't even consider going to the replica.

Mark 11:15
can we implement the strangler pattern properly? set up proper Appointments APIs to use in our datacentre, and AWS.
duplicate the data.
then dual file everything against the APIs? if one fails to file, the other gets rolled back.
we ensure consistency, we can transform the data, and we're using the pattern as intended
Jon, I agree your idea is the right way to do this sort of thing, but it will be adding logic and latency in a lot of places (as well as augmenting every one of our products to be aware of this), and not bringing us forward, but continuing to keep us in the primary data-store model

Jon 11:18
Honestly if the use case for customers looking at their data, then having it a touch out-of-date information isn't as critical as if our actual users sees an out of date view. As a hypothetical Customer who knows nothing about IT, if I viewed my record straight after a consultation
and it wasn't there I would just assume that there was a delay and it would appear later.
When it comes to actual Users viewing the record, it's absolutely critical that they see the up to date view. And when it comes to appointments that's also critical because appointment booking is fast moving, it'd be an awful experience for a User if every "free" slot they booked turned out to be booked minutes earlier.

Mark 11:19
depends, if you've just requested a particular item and the page doesn't update to indicate that, can you continue requesting it?

Jon 11:20
Many of our users (mine included) turned off online appointment booking entirely at the beginning of the pandemic and use a triage system now.
You wouldn’t be able to successfully request duplicate items, because the write would take place conditionally, so if it had been requested already then it'd say no (if designed even
vaguely competently).

Mark 11:22
the write wouldn't come through, but it'd be confusing for the User seeing the prescription still requestable, unless the application has its own datastore of state

Jon 11:22
Yes it would be far from ideal. But the CTO has some ideas about that (having a "recent changes" dataset in a cache that is updated live, and merged with the replica's data.
feels like there's loads of little bits of logic that need 'tacking on' to resolve potentially quite serious incidents. When the correct use of the strangler pattern gets us away from on-premise as primary DB, and moving in the direction we want to go
Yeah, this isn't easy and requires careful consideration.

Andrew 11:30
You are absolutely right Mark - there are a heck of a lot of potential gotchas and ultimately the plan has to be to use the strangler pattern, but at the moment we are looking at a rescue plan to put out some existing fires in the data centre and to handle predicted significant increase in load that will hit us in the Autumn. Everything that you have flagged is being considered.
The only fall-back plan that we currently have is to spend nearly £4m / year on additional SQL Server readable secondaries (on top of having to pay an additional 12% on our existing SQL Server licences thanks to MS hiking their prices) and nobody has the appetite for that.

Closing Thoughts

I don’t know what the Strangler Pattern is, so I’ll add that to my reading lists. However, it seems that even with my limited knowledge of architecture, our Software Architects have similar concerns as I do. There’s been plenty of ideas that the CTO (or similar level managers) have quickly backtracked on due to not consulting people who have knowledge on whether their idea is actually logically sound. I’ll keep my eye on this idea to see how it develops.

Changes to the Software Delivery Process

One of the problems we have where I work – is not releasing fast enough. When you read about Software Development, you hear of these companies that can release minor updates every week. Maybe it is more of a Web Development thing rather than an Application Development one, but there are also contractual reasons why we cannot release faster.

However, over time, the release time has continuously crept up to 4-6 weeks which causes problems.

If there is more time for each release, it means that the scope of the current release often increases further. For example, if there is a fix that needs to go out within 3 weeks to meet the SLA (Service Level Agreement) and the next release will go out in 4 weeks, then you have little choice but to get it in the current release. If you are checking it in close to the deadline, then you might end up delaying the release in order to test it. The more you delay, the more chance of someone else needing to get a fix in the current release and it grows further.

If there’s many big projects targeting the same release, each team developers in their own code “branch”, then will merge into the Main branch for release. Since it’s not really feasible to all merge at the same time, you end up taking it in turns and resolving any conflicting changes. To be honest, it’s quite rare that we will change the same files for the main feature changes, but there’s certain files with a lot of churn, mainly ones containing XML. To merge in projects, it usually takes a few days, then all the bug fixes on top. The Testers can’t really begin testing until it’s all merged so it’s a lot of overhead to manage the release.

When the releases are large, the Testers insist on running more Regression Tests which increases the Testing phase and can cause further delays.

“I think we spent about 2 months on that regression. It was madness. It was a HUUUGE release”

Software Tester

So smaller releases are much more manageable, take much less time to test, incur less risk, and have lower scope for scope-creep.

Our Software Delivery team made an announcement about this (basically just saying the same things I have just discussed), and desire to plan in quarters but release in a couple of weeks.

In the past, we would scope a single release looking at the features, fixes and minor enhancements we wished to deploy. We would follow a process of merging everything into our main release branch before undertaking testing. This was a two-phased testing approach, integration/functional testing each feature and fix, and then regression testing to ensure pre-existing functions continued to work as expected. We would then spend eight to ten weeks deploying the release through Controlled Roll Out and Customer User Acceptance Testing.
 
This approach brought with it a number of challenges. Merging everything in was time consuming, issues or blockers with one feature would slow down or block other features, regression testing was a challenge, and this also put pressure on the roll out and deployment through pushing out a number of changes in one go.
 
To try and mitigate some of these challenges, we are now adopting a strategy of breaking these large releases down into smaller updates.
 
Working in quarterly cycles we will scope what we wish to deliver over a 12 week period. Each feature will be analysed and risk assessed for size and complexity by our Engineering Leads, and have a business value determined by our Product Management and Commercial Teams.
 
Using this feedback we will then determine an order in which we wish to deliver each feature. We will then merge them into a release and test them one at a time (potentially two if both are small and low risk), before signing over to Release Management to commence deployment.
 
We will then deploy the full scope over a series of smaller releases rather than in one large release.
 
The last update in the cycle will be a maintenance release to address the backlog of prioritised service work.
 
The objective behind this approach is to have our users benefit by taking elements of the release scope earlier than they would have before, whilst also simplifying the testing approach and hopefully enabling us to push code out across the estate quicker.

The Secret the Task Manager developer didn’t want you to know!

Dave Plummer, who has the Youtube channel Dave’s Garage announced on Twitter:

Big news! Someone finally noticed that if you hold down CTRL, the process list in Task Manager conveniently freezes so you can select rows without them jumping around. I did this so you could sort by CPU and other dynamic columns but then still be able to click stuff…

Dave Plummer

There’s been plenty of occasions where Task Manager rows jump around to my annoyance. Why wasn’t this a more obvious feature? Frank Krueger (who appears on Merge Conflict podcast) made the obvious point:

Don’t hide features under random key combos – undiscoverable and unmemorable UIs are user hostile. A little checkbox with the text “Pause Display” would be discoverable and you won’t have to wait 30 years for someone to find your feature.

https://x.com/praeclarum/status/1693649521375621524?s=20

Unity Runtime Fee

Intro

Unity have announced a new fee which they call the “Unity Runtime Fee” which is going to take effect in January. It  affects all Unity developers, even people that have already released their game many years ago; which has caused mass outrage among the game development community.

I think the existing model states that once you reach a threshold of revenue, you have to pay a licence fee which works out around £1500 for the year. With the new model, once you reach a similar threshold, Unity is now going to charge a fee of 20 cents every time somebody instals your game on a new device for the first time.

The threshold is $200,000, which on the face of things, 20 cents per install doesn’t sound unreasonable when they have given you a great tool to help you create your game. They need to earn money as a business and deserve some kind of cut for their service/product. According to this tweet, it looks like they are burning through money so some drastic action is probably required

https://x.com/georgebsocial/status/1702696194558816751?s=20

There’s a fair few aspects of why this is new model is complicated, but I still feel some of the anger is misplaced.

I think the whole scenario is similar to what I have written about recently where the CEO demanded we release our software weekly instead of monthly and we told him several reasons why it is technically, and legally impossible. Then later he then demands all our changes have a well-documented rollback plan, and again, we told him loads of reasons why it wasn’t possible. He still insisted and looked like a fool when it backfired and caused a few problems he thought he was solving.

The main Problem

The core problem stems from the idea that it is based on installations and not based on Unit Sales or Revenue. For comparison, the main competitor, Epic Games’ Unreal Engine charges you five percent of your total revenues after you’ve earned at least a million dollars on your game. Now, that can work out to be a lot of money and especially in the long run if your game is successful, but the difference is that they’re taking a cut of your money that you’ve already earned. When Unity charges you for an installation you’re being charged whether or not you’ve earned any money, or at different period of time to where you earned the money. That could turn into a cash flow problem.

Collage of abuse:

https://x.com/LiamSorta/status/1702325745610338646?s=20

Theoretical scenarios

Once you’re over the threshold, if somebody bought your game a long time ago and they’ve now installed your game on their brand new computer, it’s going to cost you 20 cents. I suppose if it is an old game, you probably won’t be selling $200k over the last 12 months, so it’s probably not actually going to apply.

If you decide to port your game to a new platform which is often fairly easy in the Unity engine, then all those new installations are also going to be hit with fees. I suppose if you are re-selling the game then it’s not a major problem, but sometimes developers make a free-to-play mobile version. Then you make money later with microtransactions. Often these games have 90% of players not paying a penny, but then you make your money on the 10% who often spend big. In this case, you could end up losing money on the average player of your game.

People also raised the point of bundles like the Humble Bundle where people buy a bunch of games for a small price but some of the money goes to charity. You end up selling high volumes but gain very low revenue. If you hit the threshold, and you are more likely to with a sale like this, then you could be hit with a lot of fees. I think the interesting thing with this point which people don’t seem to be mentioning; is that people often buy these games then never actually play them. So you actually have a sale, but no install, so don’t pay the fee.

Fairly similar to a bundle is a service like Xbox Game Pass, where people could play your game with an overall payment to the provider, in this case Microsoft. I think Microsoft often pays a flat fee to the publishers to gain their games but I suppose contracts can vary. But the theory is, you could get a flat fee, then either get low instals so you’ve gained, or get a surprisingly large amount of instals if it is popular and it eats into your profits.

Piracy

People who pirate games don’t pay but do install your game. This means that every time your game is pirated you’re going to be slapped with a 20 cent fee. There can be other malicious ways you could be charged, if someone abuses Virtual Machines. There’s programs that will spin up large numbers of them, so you could “Install Bomb” quickly with virtual machines, hitting the developer with a 20 cent fee. It’s like when people “Review Bomb” where you leave loads of negative reviews on a game you don’t  like in a coordinated way, but in this case you need fewer people, and they directly sap the revenues of the developer instead of just hurting their online presence.

Target Price

Unity has always positioned itself as being pro Indie. They want to help new aspiring Indies learn to program, break into the gaming market, and get their career started. New developers are also much more likely to sell their games for cheap. There’s a lot of games like this on Steam which are sold for £10 or loads for £5 or less, and that’s before you apply discounts. Steam is renowned for its high discounts in sales, and so these games are being sold for just a couple of pounds. They’re going to be disproportionately hit by having to pay Unity 20 cents every time the game is installed.

In the extreme case, imagine you’ve made a Steam game or a mobile game that sells for one dollar and then you pay a sales tax of 10-20%, then Steam takes 30%, then you know you’re left with around 50 cents. If you use a Publisher, then they will take their cut too. Then Unity takes 20 cents of it for an install and then maybe another 20 cents for another install, then you could be left with basically nothing. You could then lose money if it isn’t sold for full price.

Meanwhile, if you sell a premium game for £40+ then 20 cents is nothing. So it actually hits the indies harder. Unity have ways of getting the price per install down, but they look more aimed at larger companies who will want to pay the upfront fees to use the premium Unity features.

Patch Quest

Lychee Game Labs’ Patch Quest released on 2 March 2023 and so far has reached 182,594 total key activations on Steam (people who bought the game on Steam along with everyone who got the game elsewhere like in a bundle or a giveaway or for review purposes). So if the game keeps selling, or people install on more devices, then he will be taken over the threshold then would start being charged. He did remark that “for the sake of argument, every single person who already owns the game decided to install it on a second PC, I’d be hit with a charge of $36,400. Now it’s obviously not likely that this would happen” but it does make you think how Unity are gonna deal with these outliers.

Unity Response

Within the day of the announcement, there’s a lot of angry people, and Unity has tried to clarify the points raised. However, it’s not clear if it’s actually possible to do what they claim. They reckon they have some sophisticated fraud detection technology which can prevent the “install bombing”. Then they say that they will have a process for them to submit their concerns to our fraud compliance team. So from what I understand here it sounds like the onus will be on the developer to try and somehow keep track of how many of their instals are fraudulent and then if you have concerns, you contact the fraud compliance team, and then they will hopefully give you your money back. I think the majority of people don’t have a lot of faith in such a system when Unity have to put in some work to decide if they want less money from you.

https://x.com/thomasbrushdev/status/1702797688838775134?s=20

Unity have clarified that if you’re part of a bundle like Xbox game pass or you’re in a charity bundle then you’re not going to be charged for the install, although it’s not exactly clear how they’re going to know which instals come from charity bundles or game passes. They seemed to imply that for Game Pass, they would send the bill to Microsoft but I can’t imagine Microsoft will be too happy to have sprung upon them. It would probably have to be negotiated in future Game Pass deals and it might just be the case that Microsoft just doesn’t add any Unity-based games to their service.

Unity tried to justify this whole new fee structure by pointing to the thresholds and saying “if you don’t already earn loads of money on your game then you’re not gonna pay extra”. This is where I think a lot of developers are wrongfully attacking Unity, when they would never pay them anyway. I suppose in the Patch Quest example, I’m not aware of it being a major hit, and he has pretty much reached the payment threshold. But given that it’s been many months after release, you would imagine sales will now be low and he will only be liable for minor fees which he should be happy to pay.

Conclusion

There probably is a clause somewhere deep in Unity’s terms and conditions that says something like “we retain the right to change our terms and conditions”. Companies love to have that kind of future-proofing in their legal small print, but how many actually go through with major changes? It can be logistically difficult to implement drastic changes, and evidently a PR nightmare. However, despite that, many companies are against Unity for switching the Terms and Conditions with only a few months notice. When games can take years to make, you need that predictability to adequately budget, and if Unity can charge you more on a whim, then it’s unpredictable. People also wonder if they really can change the terms built on an older Unity software version as you essentially have an agreement at the time of release; but that needs to be left to the lawyers.

Switch Engines?

I think a key statement that many are using to justify their decision to abandon Unity at this time is “Is this the last time they’re gonna change their terms? 

Jumping ship to another one might be possible when you’re just starting up on a new project but the deeper into development you get, the harder this becomes. Your game gradually ends up dependent on the engine it’s built in. Switching to Unreal Engine will require programming in C++ plus instead of C# which is a massive learning curve. Godot seems to be gaining popularity but people seem to say it specialises in 2D games at the moment. I think C# doesn’t have full support so their own GDScript is more popular.   

https://x.com/TruantPixel/status/1702132911976194091

https://x.com/DarkestDungeon/status/1702378602895941837

References:

Unity Pricing Thoughts...
 
For context, we are a small studio (7 people) with a Steam game with 3M~ players.
 
I'm seeing many non-developers tell developers that this pricing change is not a big deal, here is why the entire community is lighting a fire:
• Massively disproportionately punishes indies
• Only three months notice
• Double dipping (Licence fee/ads cut)
• Dangerous precedence for charging "runtime"; you no longer fully own that exported build. If Unity continues to struggle, pricing could become more aggressive
Here are a few examples:
• Unity's own example on their site has a hypothetical scenario:
-- $2M USD Gross in 12 Months
-- 300k Users/month (200k Standard/100k "Emerging Market"), $23.5K USD/month
-- This means $282K/Year in fees, 14% of gross revenue, 3x Epic's 5%.
• F2P Games that are NOT excessively monetised are penalised:
10M Players:$1M USD
1M Players:$10M USD
The first case, with a vastly less predatory set of MTX is now punished significantly worse than one purposefully building money-extraction machines.
Our team has been hard at work for 2 years on a massive update to our game, with a F2P mobile ver coming next year. We built this from the ground up to be ethically monetised/for whaling to be impossible, so we are particularly unhappy with the news.
This affects developers everywhere, of all sizes. I am grossly disappointed by any industry figures brushing this off as "developers complaining." that do not understand the severe damage this can cause smaller studios.
Unity's trust within the games industry has been steadily eroding for years now, this latest change is a testimony to how horrendously mismanaged the board is. Personally dumped all of my Unity stock after this announcement was made.
I'd bet heavily on the people making these decisions have never even opened the editor, let alone released a game.
 
From <https://threadreaderapp.com/thread/1702189840383832408.html>

Problems With Hosted Services

Recently we have had several major incidents due to: software bugs, incorrect configuration being applied, not renewing licence keys, and migrating servers to the cloud and failing to check all services were correctly configured and running.

Our Hosted Services team gave a presentation of work in their department, and gave more insight to even more failings that have happened recently. As far as I am aware, Hosted deal with servers, data centres and networks.

Hosted explained that due to the decision to move all servers to the cloud, when their usual time came to replace old servers – they didn’t bother. But the migration has been a slow process and delayed which meant our software was running on inferior hardware for longer than anticipated.

“We don’t need to invest in the in the architecture that we’ve got, which was not the right decision in hindsight

We had a team of people who, in some cases, were the wrong people. They didn’t have the appetite to go and actively drive out issues and reduce the points of failure in our networks.”

Hosted Manager

He then goes on to say the change in strategy caused many of their long-term staff to leave. These people that really knew how the business worked.

“So we lost around about 90% of the team over a relatively short space of time and that put us into quite a challenging position to say the least. And needless to say, we were probably on the back foot in the first quarter of this year with having to recruit pretty much an entire new team.”

Hosted Manager

Then, because they were short staffed, their backlog of work was increasing, putting more stress on the people that remained:

“We had to stop doing some tasks, and some of our incident queues and ticketing queues were going north in terms of volumes, which was really not a good place to be.”

Hosted Manager

I’ve written about this situation in the past. It has happened in the Development department when a new CTO comes in, and says that manual software testing is archaic; so people have to learn automation or lose their jobs. Then a few months later, they realise their plan isn’t so feasible, yet have lost some good software testers to other companies, or allowed others to switch roles and aren’t interested in going back. Then the releases slow down because we can’t get the fixes tested fast enough due to the last of software testers.

They go on to say the Firewalls suffered 50 major incidents in Quarter 2, and now they have “procured new firewalls” to solve it. They have reduced bandwidth into the main data centre by routing certain traffic through an alternate link. The “core switches” at our offices and data centres are “End of Life” and will be upgraded to modern hardware (Cisco Nexus 9K).

So it sounds like they have a plan, or at least are doing the best with what they have. It sounds like all departments are currently shooting themselves in the foot at the moment.