One of the thornier issues around the adoption of artificial intelligence (AI) by defence organisations is the explainability of AI-generated results. Since military analysis and decision-making can result in potentially life-or-death decisions, it is important to have the necessary confidence in those results. Explainability is an essential component for building trust in AI, as it makes AI applications more transparent and comprehensible.
Deploying AI to support and augment the process of defence decision-making is gaining more and more traction these days. A key issue for commanders and other decision-makers, however, is the explainability of AI-generated results and the degree of confidence that can be afforded to partially automated assessment and advice systems. Explainability methods are increasingly used to improve the level of trust in AI tools and therefore increase their adoption rate.
Decision support tool
As part of the work we carry out with the French Air and Space Force (FAF) to support their military air safety and risk management, we have developed models and tools that provide a stronger degree of confidence in the results delivered by SAFIR SA, the FAF’s air safety risk management toolset.
SAFIR SA has been developed to classify air incident reports according to two categories of labels: on the one hand, the causes of the incidents and on the other hand the consequences in the form of so-called harmful results. Each category contains several labels; for example, the category "causes" contains: equipment, human factor, environment, etc.
This AI-based classification system is used as a decision support tool that allows Air Force safety management to focus on analysing incident reports from across different sources and automating the detection of emerging trends, instead of simply searching for information. Incident reports are categorized based on text inputs provided by the different contributors to the incident management process: aircrews, maintenance crews, line management, etc.
Human validation
For reasons of stability and reliability, SAFIR SA is used exclusively as an analysis and decision-making support tool: all automatically generated classification is submitted for human validation. Explainability is therefore of dual interest for the SAFIR SA toolset and the FAF’s air safety risk management community process:
- Firstly, human users can understand the reasons behind automated categorizing for a given incident report. This is of significant help to understand whether a given classification may be accepted or rejected. Indeed, the explainability approach indicates those important words, and words on which models base their classification decision. That same approach can also help identify those words in an incident report that may influence a decision towards another category, revealing for instance human factor involvement.
- Secondly, explainability provides a better analysis of the model’s behaviour. This in turn can help refine the model training input data for the improvement of automatic classification. It can reveal if the model is relying on relevant words or if it has been overfitted.
Continuous improvement
In the long run, confidence brought on by greater explainability may help shift the usage of the SAFIR SA toolset by the air safety management chain. For instance, quality validation may be achieved through the sampling of categorized reports rather than through exhaustive validation. The productivity and operational gain for defence users is already considerable and may yet be increased as explainability matures.
A more extensive presentation on this topic was held at NATO Edge 22, a technology conference hosted by the NATO Communications and Information Agency (NCI Agency).
To find out more about this complex but explainable AI solution and, more specifically, the different explainability techniques used in our AI research project with FAF, please contact me or my colleagues.