Using Three Levels of AI to Distinguish Reality from Fiction
Artificial Intelligence (AI) is everywhere. As McKinsey put it, if 2023 was the year the world discovered generative AI (gen AI), 2024 is the year organizations truly began using—and deriving business value from—this new technology. Healthcare is no exception, and it goes beyond gen AI.
AI is eye-catching, it’s new, and it’s becoming synonymous with modernization—something the healthcare industry needs a heaping dose of. But what does “using AI” actually mean for health plans? At this current peak of the “AI-washing” hype cycle, it’s difficult to find a product that is not marketed as AI. So, it’s worth defining a few tests to tease apart the fiction from the reality. Let’s get into it.
Three Levels of AI
There are three “levels” of AI that you could test a product for, specifically in the context of claims processing use cases. Asking the right questions can not only help you determine what level of AI a product uses, but can also help you determine if a product even uses AI at all.
Level 1 AI
Level 1 AI systems are often the entry-point use case for value creation. They are specialized systems that can make smart decisions on numerical or optimization problems.
A healthcare example is an AI system that processes claims and decides in real time whether to pay the claim or flag it for further investigation due to suspected errors.
Here are a few questions to ask to determine if you’re dealing with a Level 1 AI system:
- Does the system automatically learn from its mistakes? The hallmark of this kind of AI system is continuous learning. If it flags a claim, and human investigation reveals that the flagging was in error, the AI system has a feedback loop to learn from its mistake and make a smarter decision the next time around.
- Does the system outperform handwritten rules, no matter how complicated? Real AI would be if you see materially higher accuracy (hit rates) than state-of-the-art rules-based systems. Just like with playing Chess or Go, AI has surpassed human levels of performance on these kinds of heavily analytical problems.
- How much outperformance do you see? With classical machine learning (ML) solutions from a decade ago, you should see about a 20-30% lift over traditional handwritten rules. With the latest generation of AI approaches, over time, you might see up to a 100% improvement in quality.
Level 2 AI
Level 2 AI systems are firmly from the “large language model (LLM) era” of AI development. They’re distinguished by their ability to natively interpret language. This opens them for use in situations that aren’t just analytical but instead involve reading and writing of documents.
Level 2 systems are “co-pilots”—they're not quite good enough to execute at the same level as a human expert, but they are useful assistants in helping experts do their jobs faster.
An example of a Level 2 AI system is one that helps clinicians and coders read medical records to validate diagnoses or service codes.
Here are a few questions to ask to determine if you’re dealing with a Level 2 AI system:
- Can the system answer evidence-extraction questions effectively?
- Can it extract lab values as a time series from a medical record? For example, can it assess white blood cell count or oxygen saturation over the course of a hospital stay?
- Can it find documentation of specific signs and symptoms on specific dates?
- Can it find evidence of treatment for various conditions?
- Can it pull unstructured information from a policy document and turn it into structured rules? etc.
- Does it double user productivity?
- Do they allow the human to easily verify their answers and correct them when necessary?
- Does verifying an answer take materially less time compared to a human hunting down the answer on their own?
- Does the use of the co-pilot delight the human?
- A good copilot must be a trusted companion that takes on repetitive tasks and frees up the human bandwidth for higher-order decisions.
- Does the co-pilot deliver material productivity gains?
- The answer to this question does depend on user training, but good co-pilots should demonstrate material time savings with at least a core group of power-users.
Co-pilots are only as good as the user interface they provide to help the expert do their job more effectively.
Level 3 AI
Level 3 AI systems are capable of holistically executing entire complex tasks on their own, rather than just being limited to providing narrow answers to pointed questions. Level 3 systems often combine LLM technology with other forms of AI technology more suited to complex reasoning. These systems can perform a chained series of steps where each step’s execution may be dependent on what came before. If this sounds like the stuff of science fiction, you wouldn’t be wrong to think so. We aren’t quite there yet, but we’re closer than most people realize.
Level 3 AI systems are currently at the “precocious intern” level of performance and not at “industry veteran” levels. But these systems are already powerful enough to be a massive time-saver for the industry veterans. The “intern” does all the leg work and produces mostly correct output. The human veteran grades the work and corrects the occasional minor errors, which can be 10x faster than starting from scratch.
An example of a Level 3 AI system is one that reads a claim along with associated medical records to determine a list of necessary claim code changes while providing the clinical rationale for the changes.
Here are a few questions to ask to determine if you’re dealing with a Level 3 AI system:
- Does the system present you with work output that is human-like?
- What output would you expect from a smart intern you hire to do that job?
- Does the system provide the same kind of output with similarly rich color and texture in its narrative?
- Does the system connect the dots across disparate facts to propose conclusions or logical deductions? It’s one thing to have a Level 2 AI system recite facts. But, can the system combine facts that may be sourced from various different places to tell a complex story? For example, can it track how a patient’s labs trended over time and correlate the trend to subjective impressions to form a narrative around what’s really going on?
- Is the system able to justify its reasoning in a persuasive way?
At this point of maturity, Level 3 AI systems are only as good as the trust they earn. That trust has to be earned by way of thoughtful explanations and justifications for each and every finding they have. It’s not enough to nail the “what,” it’s also necessary to nail the “why.”
Where Does AI Adoption Stand Today?
Across enterprises at large, Level 1 AI systems have been broadly adopted for at least ten years. These kinds of systems served as the lifeblood of companies such as Google, Meta, and Netflix from their earliest days and have since propagated into mature solutions across nearly every industry—healthcare being no exception.
Level 2 AI systems are of a more recent vintage. At Machinify, we’ve been working on such systems for the past five years and have had successful production implementations for the lion’s share of that time. As foundational LLMs have grown in capability by leaps and bounds over the last two years, the economic case for Level 2 systems has become a no-brainer. Today, these systems have become table stakes across most enterprise use cases involving complex document analysis.
Level 3 AI systems are a work in progress. They require extensive and unique data, and are complex to build and test for safety and accuracy. The pure version of unfettered AI executing entirely independently is, for the most part, strictly in the research prototype phase. However, systems with human supervision are now in production across multiple use cases and starting to deliver meaningful results.
At Machinify, we have been developing and using our Level 3 systems over the past two years. We’ve come to see that it’s critical to deploy these systems early in a testing mode to gather the rich, real-world data necessary to tune their performance and validate their accuracy. On the back of these tests, we are now on the cusp of these systems going mainstream in the upcoming weeks and months and continuously improving in their performance.
AI for Health Plans
AI has already proven itself to be too important and too game-changing a force for “wait-and-watch” to be a viable adoption strategy. Here’s how we see the imperatives and benefits around AI adoption for health plans today.
Level 1 systems are a must-have. Level 1 systems for claim selection should be used by all health plans. The best Level 1 systems provide up to a 2x lift in medical cost savings over the prior state-of-the-art, so they are often the obvious first foray into AI for a payment integrity organization.
Level 2 systems are transformational and need proper deployment. Level 2 systems now have a few years of proof behind them and provide compelling productivity gains—anywhere from 1.5x to 3x—which are only achievable through appropriate deployment and change management. Those productivity gains can then be re-invested into program expansion, resulting in significant savings growth or insourcing.
Level 3 system investment will help health plans stay ahead. Level 3 systems are a key investment area for progressive health plans looking for a competitive edge with a mindset of building scalable internal programs for payment integrity. These systems come with the promise of game-changing benefits (10x or more) leading to a phase shift in operations—though it will take careful tuning and experimentation to realize the entirety of those benefits. An early start provides the best opportunity for health plans to get there first.
Health plans should already be taking full advantage of AI applications. Yes, the different levels have different requirements, but every health plan should be considering the levels of AI and how they can positively impact their organization.
If you’re interested in taking control and transforming your payment integrity program supported by AI, schedule a Machinify demo today.
This is a selected article for our subscribers
Sign up to our newsletter to read the full version of this article, access our full library and receive relevant news.