The Feature Flag Rant

When adding new features to software, you can add a Feature Flag. If set to true, it uses the new feature, false and it doesn’t. This allows a quick roll-back feature by tweaking this value rather than releasing a new software update. However, it makes the code more complicated due to branching paths.

When all users are now using the new feature, when do you remove the code? Obviously it should be removed once all users are switched over and happy with the new functionality, but the work needs to be planned in, and what is the urgency? Project Managers will want new projects that add value, not deleting redundant code.

One of our most experienced developers posted a rant about feature flags. He pointed out there was no guidance on when to use feature flags. Do all new features get feature flags? What if it depends on a feature that already has a feature flag? Do Software Testers test each combination to make sure all code paths are supported? Is it clear which configurations are deployed on live since this should have priority when it comes to testing? By default, our Test Environments should match the config of a typical Live Environment. However, we often find that the default is some configuration that is invalid/not used.

It’s not always possible to “roll back” by switching the feature flag off. This is because to implement the change, you may have needed to refactor the code, or add new database columns. Changing the feature flag back to “off/false” just stops some new code being called, but not all new code changes (the refactored parts). So if the bug is with the changes even with the flag off; then it is still a problem.

It was also discussed that some people used our Configuration Tool for actual configuration and others were using them as Feature flags, and maybe we should have separate tools for Configuration and Features.

Feature flags cause maintenance problems. It needs to be tested on/off when implemented, then if you want to remove it, then that needs to be tested too. If you leave it in, then it’s always going to be questioned if code in that area is used/needs testing. How do you prioritise removing the code; does it belong with the team that initially created the feature? What if the team has moved on, or split?

Another developer brought up an example of how a bug existed in two places but the developer that fixed the issue was only aware of one path, and didn’t know about the other which required a feature flag to enable.

He also questioned if it is more of a problem with our process. Other companies may have quicker releases and are more flexible to rollback using ideas like Canary Deployment. Our process is slow and relies on “fix-forward” rather than rollback.

Things to consider:

  • What actually gets feature flagged?
  • When is the conditional code is removed from the codebase
  • Effect of the “Cartesian Explosion” of combination of flags on unit tests and test environments

Leave a comment