Tether's Medical AI Runs on Your Phone and Outperforms Models 16x Its Size

TL;DR
Tether's new medical AI model, QVAC MedPsy, outperforms larger models like Google's MedGemma-4B and MedGemma-27B on clinical benchmarks. The 1.7 billion-parameter model runs on consumer hardware, making it practical for local and mobile use.
Key points
- Tether's QVAC MedPsy has 1.7 billion parameters.
- It outperformed MedGemma-4B and MedGemma-27B on clinical benchmarks.
- The model runs on consumer hardware without cloud infrastructure.
- QVAC MedPsy is designed for smartphones and edge devices.
- It generates responses in about 909 tokens.
Mentioned in this story
In brief
- Tether's 1.7 billion-parameter QVAC MedPsy outperformed Google's MedGemma-4B and beat MedGemma-27B on HealthBench Hard, an OpenAI benchmark testing realistic clinical conversations graded by 262 physicians.
- The 4 billion-parameter model generates responses in ~909 tokens versus ~2,953 for comparable systems—a 3.2x reduction that makes local hospital and mobile deployment practical.
- Models ship in quantized GGUF format (1.2 GB and 2.6 GB) and run entirely on consumer hardware without cloud infrastructure.
Tether, the stablecoin company best known for USDT, just released a medical AI model that fits in your pocket and may outperform rivals more than a dozen times its size. QVAC MedPsy launched today from Tether's AI Research Group as a new class of medical language models designed to run on smartphones, wearables, and edge devices—no cloud required.
The headline number: a tiny 1.7 billion-parameter model capable of beating Google's MedGemma-4B on medical benchmarks despite being less than half its size. On HealthBench Hard—OpenAI's benchmark that evaluates AI on realistic, multi-turn clinical conversations graded by 262 physicians—Tether says its 1.7 billion-parameter model outscores MedGemma-27B, a model nearly sixteen times larger.
Parameters are all the configurations and values that a model learns during trading. The more the parameters, the better the model should be, in theory.

Source: Tether
The test suite spans MedQA-USMLE, which measures clinical knowledge using US medical licensing exam-style questions scored as percentage accuracy, all the way to AfriMedQA, which tests performance specifically for underserved African healthcare contexts.
Tether CEO Paolo Ardoino credited the gains to efficiency rather than scale. "With QVAC MedPsy, our focus was improving efficiency at the model level, rather than scaling up size," he said in a statement. "Our 4 billion model exceeded results from models nearly seven times its size, while using up to three times fewer tokens per response."
That token efficiency is the other headline. The 4B model averages around 909 tokens per response versus 2,953 for comparable systems—a 3.2x reduction. Fewer tokens means lower compute cost, faster responses, and crucially, the ability to run locally without a cloud backend.
"You can run medical reasoning where the data already exists, inside a hospital system or on a device, without moving sensitive information through the cloud or waiting on external processing," Ardoino said.
The models ship as quantized GGUF files—1.2 GB for the 1.7 billion-parameter model and 2.6 GB for the 4 billion—with compressed versions retaining most benchmark performance while fitting on standard consumer hardware. That means a hospital system, rural clinic, or individual clinician could run the model entirely on-device, keeping patient records out of third-party cloud infrastructure and away from HIPAA exposure.
The privacy pitch may be a major plus for some people but using AI for medical opinions is far from ideal even by today’s standards. An Oxford study published in February found that LLMs are routinely giving dangerous medical advice with wrong answers, confused guidance and poor handling of nuanced symptoms. The researchers stopped short of dismissing the technology entirely, but argued AI has a role as "secretary, not physician." The compliance problem compounds it: Most medical AI today routes patient data through cloud servers, creating HIPAA exposure every time a doctor types a query.
The release fits Tether's pattern over the past year. Last month it shipped the QVAC SDK, an open-source toolkit for building local, offline AI apps across iOS, Android, Windows, and Linux. Before that, it launched QVAC Health, a consumer wellness app that keeps biometric data entirely on-device. MedPsy is the first QVAC model specifically trained for clinical reasoning.
The medical AI market sits at roughly $36 billion today, with projections pointing past $500 billion by 2033, per Tether's own announcement. Models and GGUF weights are available now at qvac.tether.io/models.
Q&A
How does Tether's QVAC MedPsy compare to Google's MedGemma models?
QVAC MedPsy outperformed both Google's MedGemma-4B and MedGemma-27B on the HealthBench Hard benchmark despite having significantly fewer parameters.
What is the significance of QVAC MedPsy's parameter size?
The 1.7 billion parameters allow QVAC MedPsy to deliver high performance while being much smaller than competing models, making it feasible for deployment on mobile devices.
What are the deployment capabilities of Tether's medical AI model?
QVAC MedPsy can run entirely on consumer hardware without the need for cloud infrastructure, enabling use on smartphones and wearables.
What benchmarks did Tether's AI model excel in?
Tether's QVAC MedPsy excelled in the HealthBench Hard benchmark, which evaluates AI on realistic clinical conversations graded by physicians.





