DDAM LLMIndependent research · AI × DAM

Statistic · AI Tagging · From the AI Tagging Provider Index

6of 10

Vision APIs that let operators train custom models via API.

Six of the 10 leading image-tagging APIs ship a documented API surface for training on operator-supplied labeled data — not just classifying against a fixed taxonomy, but a real training pipeline. The split here is sharp: every classical CV provider in v1.0 except Cloudinary supports custom training; every frontier multimodal LLM is in a "Partial" or "No" state for vision-specific fine-tuning. This is the dimension that classical CV vendors still win.

As of
May 26, 2026
Sample
n=10 providers
Source
AI Tagging Index v1.0
Updated
Monthly
Methodology
Read →
Topic
AI Tagging

Custom model training via API · by provider

v1.0 · Snapshot 2026-05-26 · re-verified monthly

ProviderCustom trainingNotes
Google Cloud VisionYesAutoML Vision / Vertex AI custom training.
AWS RekognitionYesAmazon Rekognition Custom Labels.
Azure AI VisionYesAzure Custom Vision service.
ClarifaiYesCustom-training is a hallmark feature, end-to-end API.
ImaggaYesCustom Training API documented.
Hive AIYesHive AutoML for custom moderation/tagging.
OpenAI GPT-4o (vision)PartialFine-tuning generally available for text; vision fine-tuning support exists in rollout and is improving rapidly.
Google Gemini (vision)PartialVertex AI supports tuning Gemini models; vision-specific tuning surface less developed than text.
Anthropic Claude (vision)NoNo public fine-tuning offered for Claude models in v1.0.
Cloudinary AINoAI tagging is delivered via swappable third-party models; no first-party custom training surface.

"Yes" requires a documented API or workflow for uploading labeled training data, kicking off training, and serving the resulting custom model. "Partial" means fine-tuning is supported in principle but vision coverage is incomplete or in preview. Cells re-verified monthly. Methodology →

Why this still matters

For high-volume, narrow-domain tagging — your specific product catalog, your specific brand assets, your specific defect-detection task — a small custom-trained classical CV model still beats a frontier LLM on cost, latency, and predictability by an order of magnitude. The frontier multimodal models are catching up, but in v1.0 if your problem is "tag 10 million product photos against my 800-SKU catalog," you are not yet picking Claude or GPT-4o to do it.

What counts

  • Yes — documented API or workflow for training a custom model on operator-supplied labeled data.
  • Partial — fine-tuning is offered but vision coverage is limited, in preview, or rolling out.
  • No — no first-party custom-training capability.

Cite this statistic

DAM LLM Research. "Vision APIs with custom model training, May 2026." damllm.ai, 2026. https://damllm.ai/statistics/vision-apis-with-custom-training/

See also