Can AI Match Expert Rheumatologists? Insights from ARTHUR+DIANA’s Evaluation Study
At the 2024 American College of Rheumatology Convergence, researchers presented a study on an innovative ultrasound system called ARTHUR, which was combined with an AI-based software named DIANA. This system was tested to see if it could match an expert rheumatologist’s skill in assessing joint inflammation in patients with rheumatoid arthritis (RA).
Specifically, the study aimed to determine if ARTHUR and DIANA could accurately detect and grade signs of inflammation, such as synovial hypertrophy (SH) and Doppler activity, which are key markers of RA disease activity. For these assessments, researchers used a standard scoring method known as GLOESS (the global OMERACT-EULAR synovitis score).
Methodology
The study involved 30 RA patients, each of whom underwent a scan of 22 hand joints by ARTHUR, which automatically captured ultrasound images. DIANA then analyzed these images to assign synovitis scores according to the GLOESS criteria.
Alongside this, a rheumatologist skilled in musculoskeletal ultrasound scanned and scored the same joints using their own evaluations. For accuracy, an additional expert, who was unaware of the scanning methods, reviewed all images and chose the image that showed the most severe disease activity per joint as the “ground truth.”
To measure how closely ARTHUR+DIANA’s results matched the ground truth compared to the rheumatologist’s, researchers used several metrics: weighted Cohen's Kappa, Percent Exact Agreement (PEA), Percent Close Agreement (PCA), and a binary healthy-versus-disease classification.
Key Findings
ARTHUR successfully captured images of 85.45 percent of the intended joints, with the highest capture rates in key hand joints, including the second, third, and fourth metacarpophalangeal and proximal interphalangeal joints. ARTHUR+DIANA’s accuracy metrics when it came to assessing joint inflammation at the individual joint level are below:
- For synovial hypertrophy (SH): PEA was 49.01 percent, PCA was 91.23 percent, and binary agreement was 79.97 percent.
- For Doppler activity: PEA was higher at 62.58 percent, PCA reached 94.37 percent, and binary agreement was 88.08 percent.
The rheumatologist’s scores were also high, with slightly better PCA and binary agreement values for both SH and Doppler activity. Additionally, the Kappa values, which measure agreement with the ground truth, showed close alignment between ARTHUR+DIANA and the rheumatologist, particularly for Doppler activity.
At the patient level, ARTHUR+DIANA’s performance in classifying patients as “healthy” or “diseased” was comparable to that of the rheumatologist. Specifically:
- For SH: ARTHUR+DIANA achieved an 86.67 percent agreement with the ground truth, while the rheumatologist achieved 53.33 percent.
- For Doppler activity: ARTHUR+DIANA reached an agreement of 83.33 percent, with the rheumatologist at 66.67 percent.
Conclusion
This study highlights ARTHUR and DIANA’s potential to match expert rheumatologists in evaluating hand joint inflammation for RA patients. With the rising demand for RA care and a limited number of specialists, this technology may become an essential tool for delivering accurate, timely, and accessible care to those in need.
Reference:
Aplin Frederiksen B, Ammitzbøll Danielsen M, Berner Hammer H, Schultz Overgaard B, Weber A, Terslev L, Rajeeth Savarimuthu T, Just S. Automated Ultrasound System ARTHUR with AI Analysis DIANA Matches Expert Rheumatologist in Hand Joint Assessment of Rheumatoid Arthritis Patients [abstract]. Arthritis Rheumatol. 2024; 76 (suppl 9). https://acrabstracts.org/abstract/automated-ultrasound-system-arthur-with-ai-analysis-diana-matches-expert-rheumatologist-in-hand-joint-assessment-of-rheumatoid-arthritis-patients/. Accessed November 12, 2024.
Ready to Claim Your Credits?
You have attempts to pass this post-test. Take your time and review carefully before submitting.
Good luck!
Recommended
Peter Izmirly, MD
Gates B. Colbert, MD
Deepak Rao, MD, PhD
Amanda Nelson, MD
Jasvinder Singh, MD