Model Benchmark Results

Comparing 27 models across 3 engines · 2026-03-23

27
Models Tested
20
Successful
7
Failed / At Capacity
3
Engines
8
Text
6
Vision
2
Reasoning

Engine Overview

Ollama Cloud Engine Comparison

Models Tested12
Successful9
Failed3
Fastest0.9s
Slowest33.7s
Concurrency3 req (Pro)
API FormatOllama-native
Rate Limit3 concurrent
Vision/api/generate + images[]

Featherless Engine Comparison

Models Tested10
Successful4
Failed6
Fastest14.2s
Slowest21.4s
Concurrency1 req
API FormatOpenAI-compatible
Rate Limit8 model switches/min

Arli AI Engine Comparison

Models Tested5
Successful5
Failed0
Fastest6.7s
Slowest25.6s
Concurrency6 req
API FormatOpenAI-compatible
Rate Limit6 parallel total

3-Engine Comparison

Ollama Cloud

Models available34 cloud
Best textgemma3:4b (0.9s)
Best visiondevstral-small-2:24b (0.9s)
Best reasoningdeepseek-v3.2 (5.9s)
Best throughputgpt-oss:20b (130 t/s)
CapacityReliable
VS

Featherless

Models available5,700+
Best textMistral-7B-Instruct (14.2s)
Best vision-
Best reasoning-
Best throughputQwen3-8B (12.4 t/s)
CapacityOften busy
VS

Arli AI

Models available117 text + 80 image
Best textLlama-3.3-70B (6.7s)
Best visionQwen3.5-27B (19.5s)
Image genFLUX, SDXL, Qwen-Image
Best throughputLlama-3.3-70B (14.8 t/s)
CapacityReliable

Recommendations by use case, with engine pick

Fastest Text

gemma3:4b
0.9s · 56 tokens
Ollama Cloud

Best Quality Text

gpt-oss:120b
2.2s · 246 tokens · 112 t/s
Ollama Cloud

Best Vision / Screenshot QA

devstral-small-2:24b
0.9s · via /api/generate
Ollama Cloud

Best Reasoning

deepseek-v3.2
5.9s · 688B params
Ollama Cloud

Best Throughput (tokens/s)

gpt-oss:20b
130 tokens/s · 310 tokens total
Ollama Cloud

Reliable Fallback Chain

1st: Ollama gemma3:4b (0.9s)
2nd: Arli AI Llama-3.3-70B (6.7s)
3rd: Featherless Mistral-7B (14.2s)

Image Generation

FLUX.2-klein-4B
16.6s · txt2img
Arli AI

Vision Multi-Model

Mix Ollama + Featherless + Arli
Up to 3 models per request

All Text Models sorted by speed · Ollama + Featherless + Arli AI

#ModelEngineFamilySizeTimeTokenst/sPreview
1gemma3:4bOllama CloudGemma 34B0.9s5662.2Quantum computing utilizes the principles of quantum...
2ministral-3:8bOllama CloudMinistral 38B1.2s6655.0Quantum computing leverages the principles of **qu...
3gemma3:12bOllama CloudGemma 312B1.4s6345.0Quantum computing harnesses the principles of quan...
4ministral-3:14bOllama CloudMinistral 314B1.3s9976.2Quantum computing leverages the principles of **qu...
5gemma3:27bOllama CloudGemma 327B1.7s8248.2Quantum computing harnesses the bizarre principles...
6gpt-oss:20bOllama CloudGPT-OSS20B1.8s234130.0Quantum computing uses qubits that can exist in su...
7gpt-oss:120bOllama CloudGPT-OSS120B2.2s246111.8Quantum computing leverages qubits, which can exis...
8Llama-3.3-70B-InstructArli AILlama 3.370B6.7s14414.8Quantum computing is a revolutionary technology th...
9Llama-3.3-70B-ArliAI-RPMax-v3Arli AILlama 3.370B11.2s1115.9Quantum computing is a type of computing that uses...
10mistralai/Mistral-7B-Instruct-v0.3FeatherlessMistral7B14.2s532.8Quantum computing is a revolutionary technology t...
11Qwen/Qwen3-8BFeatherlessQwen 38B16.1s21812.4Quantum computing leverages qubits, which can exis...
12Qwen/Qwen2.5-72B-InstructFeatherlessQwen 2.572B18.1s1124.0Quantum computing harnesses the principles of quan...
13Qwen/Qwen2.5-7B-Instruct-1MFeatherlessQwen 2.57B21.4s712.0Quantum computing utilizes qubits that can exist i...
14Llama-3.3-70B-Instruct-AbliteratedArli AILlama 3.370B25.6s1383.6Quantum computing is a revolutionary technology th...
15devstral-small-2:24bOllama CloudDevstral24B24.6s753.0Quantum computing uses quantum bits (qubits) that...

Reasoning Models thinking/reasoning capabilities

#ModelEngineFamilySizeTimeTokensThinking
1deepseek-v3.2Ollama CloudDeepSeek V3.2688B5.9s145Yes
2kimi-k2-thinkingOllama CloudKimi K21T33.7s411Yes

Vision Models image understanding · screenshot QA

#ModelEngineFamilySizeTimeTokensQA Output
1devstral-small-2:24bOllama CloudDevstral24B0.9s-Vision via /api/generate + images[]
2qwen3.5:397bOllama CloudQwen 3.5397B~5s-Vision via /api/generate + raw base64
3qwen3-vl:235bOllama CloudQwen3-VL235B~5s-Vision via /api/generate + raw base64
4gemma3:27bOllama CloudGemma 327B~2s-Vision via /api/generate + raw base64
5kimi-k2.5Ollama CloudKimi K2.51T10s397Vision via /api/generate. Needs high max_tokens.
6Qwen3.5-27B-DerestrictedArli AIQwen 3.527B19.5s1544The user wants a QA review of a web page screenshot. 1. **Analyze the visual co
7Qwen3.5-27B-Vivid-DurianArli AIQwen 3.527B22.7s1544The user wants a QA review of the provided web page image. **1. Analyze the Ima

Failed / At Capacity retry later or use alternative engine

ModelEngineTimeError
qwen3.5:27bOllama Cloud0.2smodel not found
qwen3-vl:8bOllama Cloud0.2scannot unmarshal array into Go struct (wrong format)
qwen3-vl:32bOllama Cloud0.2scannot unmarshal array into Go struct (wrong format)
Qwen/Qwen2.5-32B-InstructFeatherless15.7sQwen/Qwen2.5-32B-Instruct is temporarily at capacity.
Qwen/Qwen2.5-7B-InstructFeatherless9.6sQwen/Qwen2.5-7B-Instruct is temporarily at capacity.
Qwen/Qwen3-32BFeatherless11.7sQwen/Qwen3-32B is temporarily at capacity.
Qwen/QwQ-32BFeatherless12sQwen/QwQ-32B is temporarily at capacity.
mistralai/Mistral-Small-3.1-24B-Instruct-2503Featherless13.1smistralai/Mistral-Small-3.1-24B-Instruct-2503 is temporarily at capacity.
mistralai/Magistral-Small-2506Featherless13.7smistralai/Magistral-Small-2506 is temporarily at capacity.