Implementing AI in Government

Artificial intelligence and machine learning (AI) systems support federal missions from national security to citizen services. The December 3 Executive Order on Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government is the latest signal that federal leaders are serious about expanding AI adoption. Here's what agencies should keep in mind when considering implementation of AI tools and how to optimize these technologies.

1. Stability and generalizability drive AI model value.

Algorithmic stability serves as the mathematical underpinning for many AI methods. AI model generalizability, or the degree to which an AI model’s results can be extrapolated to the real world, is an extension of stability and defines the true value of the model. Establishing upper bounds on an algorithm’s generalization error can improve its stability. While minimizing generalization error should occur in every model’s development, to what degree depends on the problem the system aims to solve, since it may slow the process of training a model to achieve stringent specifications. Users should consider model stability less as a single, definable target and more as a lever to be assessed for trade-off implications with each instance of AI.

2. Effective AI balances accuracy and utility.

AI systems should be highly accurate, generating reliable predictions with great frequency. But what constitutes a correct prediction typically requires implicit value judgments that can affect a system’s output dramatically. For example, the Centers for Medicare & Medicaid Services (CMS) may be interested in predicting how much care a Medicare enrollee requires over their lifetime. Does the accuracy of this system need to predict an exact value or is categorizing an enrollee’s expenses as “low,” “medium,” or “high” sufficient? The decision depends on how CMS would use these predictions. The end-user design specification for the AI system guides the trade-off between accurate prediction and usefulness. CMS may be interested in a system that determines whether a claim is fraudulent. Most claims aren’t, so a fraud-detection system that predicts all claims are nonfraudulent would still be highly accurate but obviously not serve its intended purpose. To detect fraudulent claims, one may need to accept some degree of false positives (in this case, nonfraudulent claims flagged as fraudulent), making the system less “correct” but significantly more useful.

The end-user design specification for the AI system guides the trade-off between accurate prediction and usefulness.

3. Don't let complexity hurt user confidence.

AI systems are intrinsically complex, particularly in human-involved domains like precision medicine. This cannot be helped. Models that are functionally opaque, however, inhibit users from fully trusting the generated outcomes. This can harm user confidence and buy-in, especially as the risk and consequences of potential failure of the AI system become dire.

Explainable AI (XAI) streamlines user acceptance and improves decision-making by providing rationales for why an AI system came to a particular result. Two XAI approaches are feature optimization/visualization (generating emblematic, synthetic data that illustrates what individual components of a network do) and attribution (employing techniques like saliency maps to indicate which components contribute most to a model’s output in a given example).

While XAI can be useful to analysts and decision-makers, it is not always necessary or even desirable. The trade-off for including XAI methods may come at the cost of model performance and encourage users to over-trust models. Alternatively, stakeholders should consider related concepts such as interpretable machine learning and assured fairness (designing systems to optimize objectives while ensuring equitable outcomes). Ultimately, the desired state of operation should be high-performing models that enable users to make informed decisions with confidence.

4. Organizational buy-in matters. Slower adoption may be better.

Even the most effective AI system can fail to benefit organizations when the user base—the workforce—shies away. This is probably the most common roadblock to successful AI implementation. Leaders who are understandably eager to share new, game-changing AI applications often strive to deploy AI to the entire organization as soon as possible. This approach is almost assured to fail.

Incremental adoption to select departments, or even a few employees, may serve the organization better in the long run. Slower deployments build internal support and reduce the risk of mission-disrupting errors that could stop the initiative instantly, and perhaps permanently. When dealing with a skeptical workforce, organizations may be well served by prioritizing XAI over black-box applications, which produce results without explanation. A system that can offer justifications would be harder for employees to brush off as a useless machine—taking away some of the mystery around what AI actually is and how it will deliver better results is a proven way to culture a workforce.

The potential benefits of AI appear to be boundless, but it is important for the government to learn lessons from the private sector as well as early federal adopters. It is also crucial to consider the potential trade-offs involved with deployment of AI at scale. Taking advantage of the road laid by those who have headed down this path will offer a smoother, more effective, and impactful experience for agencies and their workforces.

Even the most effective AI system can fail to benefit organizations when the user base—the workforce—shies away. This is probably the most common roadblock to successful AI implementation.

5. Not ready for AI? Start with data management.

The federal government has invested significantly in data readiness and AI applications, yet it is far from an AI-ready enterprise. Data drives AI, and there are challenges in data readiness that influence the trade-offs described earlier. Data requirements for conducting AI-based modeling are minimally documented. Data is not collected and stored uniformly, resulting in siloed, disparate activities across the government. Duplicative, inconsistent, and incompatible data sets diminish what AI tools could achieve with a single, integrated one. Rather than relying on automation, data ingestion is often performed manually and irregularly within and across agencies.

These shortcomings limit AI, for example, to reporting summary statistics, rankings of numerical scoring without further analysis, or dynamic organizational deployment. The result is an AI environment and solutions confined to evaluating the most costly or understood challenges, with less emphasis on unexplored outcomes and atypical situations that could surprise decision-makers. Ultimately, if your organization’s data readiness is immature, it may not be ready for the full value proposition of AI, but even modest data improvements can open up the possibilities to better leverage AI for meaningful and impactful benefits.

6. Data science is a team sport.

Implementing AI is part of an iterative discovery process, and understanding and organizing around trade-offs is key. Your AI teams likely employ highly talented data scientists, statisticians, programmers, and developers, but data science cannot operate in a vacuum. The tight coupling of AI implementation with design constraints imposed by an understanding of relevant trade-offs necessitates frequent collaboration and communication with end users and decision-makers who will use the final AI solution. The AI development lifecycle of design, development, implementation, and sustainment should include iterative collaboration loops to catch problems and concerns as early as possible, and build in exit ramps for users to reexamine, strategize, redefine, and revector progress.