Vision APIs that let operators train custom models via API.

Six of the 10 leading image-tagging APIs ship a documented API surface for training on operator-supplied labeled data — not just classifying against a fixed taxonomy, but a real training pipeline. The split here is sharp: every classical CV provider in v1.0 except Cloudinary supports custom training; every frontier multimodal LLM is in a "Partial" or "No" state for vision-specific fine-tuning. This is the dimension that classical CV vendors still win.

As of: May 26, 2026
Sample: n=10 providers
Source: AI Tagging Index v1.0
Updated: Monthly
Methodology: Read →
Topic: AI Tagging

Custom model training via API · by provider

v1.0 · Snapshot 2026-05-26 · re-verified monthly

Provider	Custom training	Notes
Google Cloud Vision	Yes	AutoML Vision / Vertex AI custom training.
AWS Rekognition	Yes	Amazon Rekognition Custom Labels.
Azure AI Vision	Yes	Azure Custom Vision service.
Clarifai	Yes	Custom-training is a hallmark feature, end-to-end API.
Imagga	Yes	Custom Training API documented.
Hive AI	Yes	Hive AutoML for custom moderation/tagging.
OpenAI GPT-4o (vision)	Partial	Fine-tuning generally available for text; vision fine-tuning support exists in rollout and is improving rapidly.
Google Gemini (vision)	Partial	Vertex AI supports tuning Gemini models; vision-specific tuning surface less developed than text.
Anthropic Claude (vision)	No	No public fine-tuning offered for Claude models in v1.0.
Cloudinary AI	No	AI tagging is delivered via swappable third-party models; no first-party custom training surface.

"Yes" requires a documented API or workflow for uploading labeled training data, kicking off training, and serving the resulting custom model. "Partial" means fine-tuning is supported in principle but vision coverage is incomplete or in preview. Cells re-verified monthly. Methodology →

Why this still matters

For high-volume, narrow-domain tagging — your specific product catalog, your specific brand assets, your specific defect-detection task — a small custom-trained classical CV model still beats a frontier LLM on cost, latency, and predictability by an order of magnitude. The frontier multimodal models are catching up, but in v1.0 if your problem is "tag 10 million product photos against my 800-SKU catalog," you are not yet picking Claude or GPT-4o to do it.

What counts

Yes — documented API or workflow for training a custom model on operator-supplied labeled data.
Partial — fine-tuning is offered but vision coverage is limited, in preview, or rolling out.
No — no first-party custom-training capability.

Cite this statistic

DAM LLM Research. "Vision APIs with custom model training, May 2026." damllm.ai, 2026. https://damllm.ai/statistics/vision-apis-with-custom-training/

Why this still matters

What counts

Cite this statistic

See also