Model Benchmark Results

Models Tested

Successful

Failed / At Capacity

Engines

Text

Vision

Reasoning

Engine Overview

Ollama Cloud Engine Comparison

Models Tested12

Successful9

Failed3

Fastest0.9s

Slowest33.7s

Concurrency3 req (Pro)

API FormatOllama-native

Rate Limit3 concurrent

Vision/api/generate + images[]

Featherless Engine Comparison

Models Tested10

Successful4

Failed6

Fastest14.2s

Slowest21.4s

Concurrency1 req

API FormatOpenAI-compatible

Rate Limit8 model switches/min

Arli AI Engine Comparison

Models Tested5

Successful5

Failed0

Fastest6.7s

Slowest25.6s

Concurrency6 req

API FormatOpenAI-compatible

Rate Limit6 parallel total

3-Engine Comparison

Ollama Cloud

Models available34 cloud

Best textgemma3:4b (0.9s)

Best visiondevstral-small-2:24b (0.9s)

Best reasoningdeepseek-v3.2 (5.9s)

Best throughputgpt-oss:20b (130 t/s)

CapacityReliable

Featherless

Models available5,700+

Best textMistral-7B-Instruct (14.2s)

Best vision-

Best reasoning-

Best throughputQwen3-8B (12.4 t/s)

CapacityOften busy

Arli AI

Models available117 text + 80 image

Best textLlama-3.3-70B (6.7s)

Best visionQwen3.5-27B (19.5s)

Image genFLUX, SDXL, Qwen-Image

Best throughputLlama-3.3-70B (14.8 t/s)

CapacityReliable

Recommendations by use case, with engine pick

Fastest Text

gemma3:4b

0.9s · 56 tokens

Ollama Cloud

Best Quality Text

gpt-oss:120b

2.2s · 246 tokens · 112 t/s

Ollama Cloud

Best Vision / Screenshot QA

devstral-small-2:24b

0.9s · via /api/generate

Ollama Cloud

Best Reasoning

deepseek-v3.2

5.9s · 688B params

Ollama Cloud

Best Throughput (tokens/s)

gpt-oss:20b

130 tokens/s · 310 tokens total

Ollama Cloud

Reliable Fallback Chain

1st: Ollama gemma3:4b (0.9s)

2nd: Arli AI Llama-3.3-70B (6.7s)

3rd: Featherless Mistral-7B (14.2s)

Image Generation

FLUX.2-klein-4B

16.6s · txt2img

Arli AI

Vision Multi-Model

Mix Ollama + Featherless + Arli

Up to 3 models per request

All Text Models sorted by speed · Ollama + Featherless + Arli AI

#	Model	Engine	Family	Size	Time	Tokens	t/s	Preview
1	gemma3:4b	Ollama Cloud	Gemma 3	4B	0.9s	56	62.2	Quantum computing utilizes the principles of quantum...
2	ministral-3:8b	Ollama Cloud	Ministral 3	8B	1.2s	66	55.0	Quantum computing leverages the principles of **qu...
3	gemma3:12b	Ollama Cloud	Gemma 3	12B	1.4s	63	45.0	Quantum computing harnesses the principles of quan...
4	ministral-3:14b	Ollama Cloud	Ministral 3	14B	1.3s	99	76.2	Quantum computing leverages the principles of **qu...
5	gemma3:27b	Ollama Cloud	Gemma 3	27B	1.7s	82	48.2	Quantum computing harnesses the bizarre principles...
6	gpt-oss:20b	Ollama Cloud	GPT-OSS	20B	1.8s	234	130.0	Quantum computing uses qubits that can exist in su...
7	gpt-oss:120b	Ollama Cloud	GPT-OSS	120B	2.2s	246	111.8	Quantum computing leverages qubits, which can exis...
8	Llama-3.3-70B-Instruct	Arli AI	Llama 3.3	70B	6.7s	144	14.8	Quantum computing is a revolutionary technology th...
9	Llama-3.3-70B-ArliAI-RPMax-v3	Arli AI	Llama 3.3	70B	11.2s	111	5.9	Quantum computing is a type of computing that uses...
10	mistralai/Mistral-7B-Instruct-v0.3	Featherless	Mistral	7B	14.2s	53	2.8	Quantum computing is a revolutionary technology t...
11	Qwen/Qwen3-8B	Featherless	Qwen 3	8B	16.1s	218	12.4	Quantum computing leverages qubits, which can exis...
12	Qwen/Qwen2.5-72B-Instruct	Featherless	Qwen 2.5	72B	18.1s	112	4.0	Quantum computing harnesses the principles of quan...
13	Qwen/Qwen2.5-7B-Instruct-1M	Featherless	Qwen 2.5	7B	21.4s	71	2.0	Quantum computing utilizes qubits that can exist i...
14	Llama-3.3-70B-Instruct-Abliterated	Arli AI	Llama 3.3	70B	25.6s	138	3.6	Quantum computing is a revolutionary technology th...
15	devstral-small-2:24b	Ollama Cloud	Devstral	24B	24.6s	75	3.0	Quantum computing uses quantum bits (qubits) that...

Reasoning Models thinking/reasoning capabilities

#	Model	Engine	Family	Size	Time	Tokens	Thinking
1	deepseek-v3.2	Ollama Cloud	DeepSeek V3.2	688B	5.9s	145	Yes
2	kimi-k2-thinking	Ollama Cloud	Kimi K2	1T	33.7s	411	Yes

Vision Models image understanding · screenshot QA

#	Model	Engine	Family	Size	Time	Tokens	QA Output
1	devstral-small-2:24b	Ollama Cloud	Devstral	24B	0.9s	-	Vision via /api/generate + images[]
2	qwen3.5:397b	Ollama Cloud	Qwen 3.5	397B	~5s	-	Vision via /api/generate + raw base64
3	qwen3-vl:235b	Ollama Cloud	Qwen3-VL	235B	~5s	-	Vision via /api/generate + raw base64
4	gemma3:27b	Ollama Cloud	Gemma 3	27B	~2s	-	Vision via /api/generate + raw base64
5	kimi-k2.5	Ollama Cloud	Kimi K2.5	1T	10s	397	Vision via /api/generate. Needs high max_tokens.
6	Qwen3.5-27B-Derestricted	Arli AI	Qwen 3.5	27B	19.5s	1544	The user wants a QA review of a web page screenshot. 1. **Analyze the visual co
7	Qwen3.5-27B-Vivid-Durian	Arli AI	Qwen 3.5	27B	22.7s	1544	The user wants a QA review of the provided web page image. **1. Analyze the Ima

Failed / At Capacity retry later or use alternative engine

Model	Engine	Time	Error
qwen3.5:27b	Ollama Cloud	0.2s	model not found
qwen3-vl:8b	Ollama Cloud	0.2s	cannot unmarshal array into Go struct (wrong format)
qwen3-vl:32b	Ollama Cloud	0.2s	cannot unmarshal array into Go struct (wrong format)
Qwen/Qwen2.5-32B-Instruct	Featherless	15.7s	Qwen/Qwen2.5-32B-Instruct is temporarily at capacity.
Qwen/Qwen2.5-7B-Instruct	Featherless	9.6s	Qwen/Qwen2.5-7B-Instruct is temporarily at capacity.
Qwen/Qwen3-32B	Featherless	11.7s	Qwen/Qwen3-32B is temporarily at capacity.
Qwen/QwQ-32B	Featherless	12s	Qwen/QwQ-32B is temporarily at capacity.
mistralai/Mistral-Small-3.1-24B-Instruct-2503	Featherless	13.1s	mistralai/Mistral-Small-3.1-24B-Instruct-2503 is temporarily at capacity.
mistralai/Magistral-Small-2506	Featherless	13.7s	mistralai/Magistral-Small-2506 is temporarily at capacity.

Documentation Featherless Catalog Image Studio Vision AI API Gateway Raw JSON