Spaces:

likhonsheikhdev
/

docker-model-runner

Sleeping

File size: 5,049 Bytes

---
title: Docker Model Runner
emoji: 🐳
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
suggested_hardware: cpu-basic
pinned: false
---

# Docker Model Runner

**Anthropic API Compatible** with **Interleaved Thinking** support.

## Hardware
- **CPU Basic**: 2 vCPU · 16 GB RAM

## Quick Start

```bash
pip install anthropic
export ANTHROPIC_BASE_URL=https://likhonsheikhdev-docker-model-runner.hf.space
export ANTHROPIC_API_KEY=any-key
```

```python
import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="MiniMax-M2",
    max_tokens=1000,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Hi, how are you?"}]
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking:\n{block.thinking}\n")
    elif block.type == "text":
        print(f"Text:\n{block.text}\n")
```

## Interleaved Thinking

Enable thinking to get reasoning steps interleaved with responses:

```python
import anthropic

client = anthropic.Anthropic(
    base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)

message = client.messages.create(
    model="MiniMax-M2",
    max_tokens=1024,
    thinking={
        "type": "enabled",
        "budget_tokens": 200
    },
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Response contains interleaved thinking and text blocks
for block in message.content:
    if block.type == "thinking":
        print(f"💭 Thinking: {block.thinking}")
    elif block.type == "text":
        print(f"📝 Response: {block.text}")
```

## Streaming with Thinking

```python
import anthropic

client = anthropic.Anthropic(
    base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)

with client.messages.stream(
    model="MiniMax-M2",
    max_tokens=1024,
    thinking={"type": "enabled", "budget_tokens": 100},
    messages=[{"role": "user", "content": "Hello!"}]
) as stream:
    for event in stream:
        if hasattr(event, 'type'):
            if event.type == 'content_block_start':
                print(f"\n[{event.content_block.type}]", end=" ")
            elif event.type == 'content_block_delta':
                if hasattr(event.delta, 'thinking'):
                    print(event.delta.thinking, end="")
                elif hasattr(event.delta, 'text'):
                    print(event.delta.text, end="")
```

## Multi-Turn with Thinking History

**Important**: In multi-turn conversations, append the complete model response (including thinking blocks) to maintain reasoning chain continuity.

```python
import anthropic

client = anthropic.Anthropic(
    base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)

messages = [{"role": "user", "content": "What is 2+2?"}]

# First turn
response = client.messages.create(
    model="MiniMax-M2",
    max_tokens=1024,
    thinking={"type": "enabled", "budget_tokens": 100},
    messages=messages
)

# Append full response (including thinking) to history
messages.append({
    "role": "assistant",
    "content": response.content  # Includes both thinking and text blocks
})

# Second turn
messages.append({"role": "user", "content": "Now multiply that by 3"})

response2 = client.messages.create(
    model="MiniMax-M2",
    max_tokens=1024,
    thinking={"type": "enabled", "budget_tokens": 100},
    messages=messages
)
```

## Supported Models

| Model | Description |
|-------|-------------|
| MiniMax-M2 | Agentic capabilities, Advanced reasoning |
| MiniMax-M2-Stable | High concurrency and commercial use |

## API Compatibility

### Parameters

| Parameter | Status |
|-----------|--------|
| model | ✅ Fully supported |
| messages | ✅ Partial (text, tool calls) |
| max_tokens | ✅ Fully supported |
| stream | ✅ Fully supported |
| system | ✅ Fully supported |
| temperature | ✅ Range (0.0, 1.0] |
| thinking | ✅ Fully supported |
| thinking.budget_tokens | ✅ Fully supported |
| tools | ✅ Fully supported |
| tool_choice | ✅ Fully supported |
| top_p | ✅ Fully supported |
| metadata | ✅ Fully supported |
| top_k | ⚪ Ignored |
| stop_sequences | ⚪ Ignored |

### Message Types

| Type | Status |
|------|--------|
| text | ✅ Supported |
| thinking | ✅ Supported |
| tool_use | ✅ Supported |
| tool_result | ✅ Supported |
| image | ❌ Not supported |
| document | ❌ Not supported |

## Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/v1/messages` | POST | Anthropic Messages API |
| `/v1/chat/completions` | POST | OpenAI Chat API |
| `/v1/models` | GET | List models |
| `/health` | GET | Health check |
| `/info` | GET | API info |

## cURL Example

```bash
curl -X POST https://likhonsheikhdev-docker-model-runner.hf.space/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: any-key" \
  -d '{
    "model": "MiniMax-M2",
    "max_tokens": 1024,
    "thinking": {"type": "enabled", "budget_tokens": 100},
    "messages": [
      {"role": "user", "content": "Explain AI briefly"}
    ]
  }'
```