This is a follow up question from my Continuous Delivery talk at Codemotion. Please check the original post for the slides.
For an introduction to Feature Toggles check Martin Fowler's bliki. There are some comments about testing there.
Let’s say we have 2 features, A and B, that have been in development for a while. When they’re ready to be shipped the Product Owner will decide how to release them: only one of them, both at the same time, A/B test a feature for a partition of users, etc.
From the developer perspective, we'll use Toggle Flags to enable/disable this features. It’s the developer responsibility to make sure the application can run in all the combinations of Flags:
- A and B deactivated
- A and B activated
- Only A activated
- Only B activated
In total, we'd have 4 different scenarios:
As usual, in order to ensure everything is ok and there are no regressions the developer should create acceptance tests for all the scenarios.
The problem starts when the number of flags grow. The number of different scenarios is given by the formula 2 to the power of #flags:
#scenarios = 2#flags
- 4 scenarios for 2 flags
- 8 scenarios for 3 flags
- 16 scenarios for 4 flags and so on.
Needless to say how complex the situation might become when the number of flags are uncontrolled.
So what can we do? Here are some advices:
Use Toggle Flags wisely
Team should be aware that the use of flags has complexity attached. Use them only when they’re really needed.
There are many bug fixes, small features and improvements that just need to be available as soon as the latest version is deployed. In those cases a Toggle is not needed.
Imagine a News site, containing a feed, headers and content. Each section is expected to have a new feature, these are named A, B and C.
In practice, each of this areas is independent of the others. The fact that the toggle for A is disabled or disabled in the feed is not going to affect the content.
Under this situation, we’d only have 2 scenarios to test:
- all the features enabled.
- all the features disabled.
By breaking down our app into separate, decoupled modules we’ve reduced the amount of tests required from 8 scenarios to only 2.
Consider grouping flags
For example, a WIP flag might be useful to prevent all the work in progress code to be available to the user. Since that code is not finished, and therefore not expected to be put live, they can live together under the same unfinished stuff flag. At Plumbee we rely heavily on this.
There are also features that are tightly related. If that’s the case it’s probably a good idea to use the same flag for both of them and activate/deactivate at the same time.