In his book, Antifragile, Nassim Nicholas Taleb describes and develops the concept of antifragility – the property of a system to increase in robustness in response to faults or failures. In the computing world, errors often propagate a long way from their source. A simple memory allocation failure dozens of levels deep inside an application can bubble up and cause a long process to be interrupted and waste hours of work. This is pretty much the definition of a fragile system.
Recently I came across the Maximum Failure Percentage option in Ansible that is part of the Rolling Updates feature and was thinking about how it could be used to build antifragile systems for deploying and updating software.