Episode 70

In the physical world, we are used to parts wearing down and requiring repair or replacement. If a physical system is not properly maintained, the risk that it will fail increases with use and time. Living in southern Ontario the winters here are especially rough on vehicles as the temperatures allow for road salting to be done for much of the season. To prevent rust, one must be proactive. I want to make the argument in this blog that digital systems experience the same phenomena. I will cover the mechanisms for Software Rust, why it is different from technical debt, and finish with some ideas around prevention.

A well functioning mechanical system will have had engineers work through the necessary design, simulation, and testing efforts to deliver an optimal maintenance plan that ensures the most value can be extracted from the system before its eventual replacement. Scheduled preventive or predictive maintenance are two methods to ensure a mechanical system remains in a well functioning state. Operating experience can provide lessons learned further enhancing efficiency. I will use these concepts and show where they apply to the digital space.

Running the same piece of code once is no different than one million or one billion times, however, digital systems have a direct connection to the physical world through the hardware that runs the software. Hardware failures, memory corruption, or physical disasters like fires and floods can reach into the digital realm. To protect digital systems, the most common method is replication. While the physical hardware is important, the causes of Software Rust are environmental. Here are some examples: changing development frameworks, discovered vulnerabilities, improvements in computation and storage, regulatory/accessibility requirements, and change in UI/UX best practices. Just like how a mechanical system wears down, environmental changes contribute to the increased fragility of a digital system, even if the hardware that supports the system is functioning optimally.

But how is Software Rust different from technical debt? Technical debt is taken on as a rational compromise to ensure a solution is delivered on time. Technical debt is accumulated when a short and quick solution is favoured over a complete one. Technical debt can be paid down, but at the cost of new feature development. Lastly, the removal cost of technical debt increases with time if not addressed. Software Rust differs from technical debt as it is a function of time and environmental exposure. Leaving road salt on your vehicle will result in rust more quickly than in a clean environment, likewise Software developed in a rapidly changing ecosystem will rust faster than a steady system. Here are examples of environments that could impact how quickly software could rust: consumer facings versus enterprise; open source versus custom developed; high computation resource needs versus low; many integrations versus very few; or, deals with highly critical information versus public.

office lego

Another dimension to Software Rusting is the human element. When individuals complete a feature, they likely learned from the process. Likewise, as projects are completed, both the team and organization grow in collective knowledge and experience. Knowledge management is a critical source of Software Rust prevention. Just as knowledge is gained over time, working knowledge of the systems we create is forgotten if left unvisited for a period. Mental atrophy can set in, making maintenance of past systems more difficult. This is especially true with large systems where retirements, staff churn, or the passage of time has resulted in knowledge gaps in how the system worked.

To summarize, Software Rust is a process by which the external environment increases the fragility of the digital system it contains. A critical piece in the puzzle are the humans that maintain the system. In the software world, everyone is learning continuously. It does not matter if you are a recent graduate or a 20-year veteran, the space changes so quickly, that to remain relevant, new skills must be constantly adopted. As individuals, teams, and organizations gain experience in new technologies, this is a generative process that allows for future development using the same systems to be built with greater efficiency. At least until the platforms change enough that the required skills to complete similar work no longer overlap.

So how to prevent Software Rust? We cannot control time, but we can control elements of the environment our digital systems live in and the processes we employ to ensure that our people can maintain working knowledge of the current state and apply new knowledge to improve on it. Rust prevention both in the physical and digital space is done through both at design (management of Technical Debt) and in maintenance of both the digital and mental space (Software Rust). Ensure that development frameworks are current, and vulnerabilities are both monitored for and addressed. Have developers revisit and refactor old code, when possible. A little bit of periodic maintenance and review can go a long way in ensuring that your organization’s digital systems remain resilient to change over time.