🚀 Available Models

Explore our collection of cutting-edge AI models available on ModelHarbor. From ultra-fast lightweight models to advanced reasoning engines — all accessible via our OpenAI-compatible API.

💚 Qwen/Qwen3.6-35B-A3B

Budget GOAT 👑

Looking for max value? This is the ULTIMATE budget king! 💸 Qwen3.6-35B-A3B is a Mixture-of-Experts model with only 3B active parameters out of 35B total — meaning it's crazy efficient while still delivering solid performance. It supports vision, function calling, AND computer use at the lowest price point on the platform. No cap, this is the move for high-volume tasks that need to stay wallet-friendly!

  • Insanely Affordable: Starting from just $0.16/M input tokens — the cheapest model on ModelHarbor!
  • MoE Efficiency: 35B total params with only ~3B active per token = fast + cheap
  • Vision + Function Calling: Understands images and supports tool use despite the low price
  • 262K Context: Massive context window for long documents and codebases

🎨 google/gemini-2.5-flash-image

Image Gen Flash ⚡

Need AI-generated images? Gemini 2.5 Flash Image is your go-to! 🎨 This model specializes in creating and understanding images with Google's latest Gemini technology. It supports vision, computer use, and web search — all at an incredibly affordable price point. Perfect for creative projects, visual analysis, and multimodal applications!

  • Image Generation: Create images from text descriptions
  • Vision + Computer Use: Understand images and interact with screens
  • Web Search: Access real-time information from the internet
  • Super Affordable: Just $0.30/M input tokens for image-capable AI

⚡ gemini-3.1-flash-lite-preview

Speed Demon 🏎️

Need blazing-fast responses without breaking the bank? Gemini 3.1 Flash Lite is the speed champion! 🏎️ This lightweight model delivers Gemini-quality responses at lightning speed with a massive 1M token context window. It supports vision, computer use, and web search — making it incredibly versatile for its price. The ultimate choice for high-throughput applications!

  • 1M Token Context: One of the largest context windows available anywhere
  • Ultra Fast: Lite architecture means rapid responses
  • Full Multimodal: Vision, Computer Use, and Web Search all included
  • Great Value: Only $0.40/M input tokens with 1M context

🧠 glm (GLM-5)

Reasoning Value 🧩

GLM-5 is the value champion for reasoning tasks! 🧩 This model from Zhipu AI delivers impressive thinking capabilities with built-in reasoning support and prompt caching for even better efficiency. With a massive 202K context window and function calling support, it's perfect for complex analytical tasks that don't require vision. The prompt caching feature means repeated queries cost even less!

  • Built-in Reasoning: Supports extended thinking for complex problems
  • Prompt Caching: Cache reads cost near-zero — great for repeated queries
  • 202K Context: Large context window for detailed analysis
  • Function Calling: Supports parallel function calling for agentic workflows

🖌️ google/gemini-3.1-flash-image-preview

Next-Gen Image ✨

The next generation of image generation is here! ✨ Gemini 3.1 Flash Image Preview brings Google's latest multimodal capabilities with a 65K token context and 65K max output. It supports vision and web search, making it perfect for creative workflows that need both image understanding and generation. This preview model gives you early access to cutting-edge image AI technology!

  • Next-Gen Image AI: Latest Gemini 3.1 technology for image generation
  • 65K Input + 65K Output: Generous context for complex image tasks
  • Vision + Web Search: Understand images and search the web
  • Preview Access: Be the first to try Google's newest image model

🔥 glm-max (GLM-5.1)

Max Performance 💎

When you need maximum performance from the GLM family, GLM-5.1 (glm-max) delivers! 💎 This is Zhipu AI's most capable model with 202K context, prompt caching, and parallel function calling. It's designed for demanding text-based tasks that require deep understanding and precise outputs. The perfect choice when you need top-tier reasoning without multimodal overhead.

  • Maximum GLM Performance: The most powerful model in the GLM family
  • 202K Context: Massive context for complex document analysis
  • Prompt Caching: Efficient repeated queries with caching support
  • Parallel Function Calling: Execute multiple tools simultaneously

🚀 glm-turbo (GLM-5-Turbo)

Turbo Mode ⚡

Need GLM power at turbo speed? GLM-5-Turbo is your answer! ⚡ Same 202K context and function calling capabilities as glm-max, but optimized for faster responses. Perfect for production workloads that need both quality and speed. When you want GLM-level intelligence with minimal latency, this is the one!

  • Turbo Speed: Optimized for fast response times
  • 202K Context: Same massive context window as glm-max
  • Prompt Caching: Efficient for repeated queries
  • Parallel Function Calling: Multi-tool execution support

⚡ deepseek-v4-flash

Flash Value King 👑

DeepSeek V4 Flash is the NEW budget champion! ⚡ Same price as Qwen3.6 but with a jaw-dropping 3.84M token max output — that's not a typo! This model can generate massively long responses, making it perfect for code generation, long-form content, and complex multi-step tasks. Plus it supports vision, computer use, and function calling at the cheapest price on the platform!

  • 3.84M Max Output: Generate up to 3.84 million tokens in a single response — unprecedented!
  • Rock Bottom Price: Just $0.16/M input, $1.00/M output — tied for cheapest
  • Vision + Computer Use: Full multimodal support at budget pricing
  • Function Calling: Tool use for building AI agents
  • 100K Context: Solid context window for most use cases

🌟 gemini-3-flash-preview

Balanced Pro ⚖️

Gemini 3 Flash Preview is the balanced powerhouse in Google's lineup! ⚖️ With a massive 1M token context, vision support, PDF processing, web search, and computer use — it's got everything you need for serious multimodal work. The premium pricing reflects its top-tier capabilities across all modalities. Perfect when you need the full Gemini experience with deep document understanding!

  • 1M Token Context: Massive context for processing huge documents
  • PDF Support: Native PDF input processing built-in
  • Full Multimodal: Vision, Computer Use, and Web Search
  • Tiered Pricing: Different rates above 200K tokens for flexibility

🔥 deepseek-v4-pro

Pro Powerhouse 🔋

DeepSeek V4 Pro is the premium DeepSeek model with the same jaw-dropping 3.84M token max output! 🔋 This is DeepSeek's most capable model — combining advanced reasoning with vision, computer use, and function calling. When you need DeepSeek's best quality output with massive generation length, this is the one. A serious contender in the pro-tier model space!

  • 3.84M Max Output: Generate up to 3.84 million tokens — unprecedented for a pro model
  • Pro Quality: DeepSeek's most capable model for demanding tasks
  • Vision + Computer Use: Full multimodal understanding and screen interaction
  • Function Calling: Tool use for sophisticated AI agents
  • 100K Context: Solid context window with massive output capacity

👑 gemini-3.1-pro-preview

Ultimate Pro 👑

The king of the hill has arrived! 👑 Gemini 3.1 Pro Preview is Google's most powerful AI model, delivering state-of-the-art performance across all benchmarks. With a 1M token context window, vision, computer use, and web search — this is the ultimate model for the most demanding tasks. When you need the absolute best, no compromises, Gemini 3.1 Pro is the answer!

  • Maximum Performance: Google's most capable Gemini model
  • 1M Token Context: Process entire codebases and massive documents
  • Full Multimodal: Vision, Computer Use, and Web Search all at pro level
  • State-of-the-Art: Top-tier results across all AI benchmarks