Skip to content

流式响应

使用流式响应(Server-Sent Events)实时返回生成内容,提升用户体验。

启用流式响应

设置 stream: true 参数:

javascript
const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'user', content: '写一首关于春天的诗' }
  ],
  stream: true  // 启用流式响应
});

响应格式

流式响应使用 Server-Sent Events (SSE) 格式:

data: {"choices":[{"delta":{"role":"assistant"}}]}

data: {"choices":[{"delta":{"content":"春"}}]}

data: {"choices":[{"delta":{"content":"天"}}]}

data: {"choices":[{"delta":{"content":"来"}}]}

data: {"choices":[{"delta":{"content":"了"}}]}

data: [DONE]

使用示例

Python

python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.myopenhub.com/v1"
)

stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "写一首诗"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='')

Node.js

javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://api.myopenhub.com/v1'
});

const stream = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: '写一首诗' }],
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || '';
  process.stdout.write(content);
}

cURL

bash
curl https://api.myopenhub.com/v1/llm/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "写一首诗"}],
    "stream": true
  }'

前端集成

React 示例

typescript
import { useState } from 'react';

function ChatComponent() {
  const [response, setResponse] = useState('');
  const [loading, setLoading] = useState(false);

  async function sendMessage(message: string) {
    setLoading(true);
    setResponse('');

    const res = await fetch('https://api.myopenhub.com/v1/llm/chat/completions', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model: 'auto',
        messages: [{ role: 'user', content: message }],
        stream: true
      })
    });

    const reader = res.body?.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader!.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n').filter(line => line.trim());

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = line.slice(6);
          if (data === '[DONE]') continue;

          try {
            const json = JSON.parse(data);
            const content = json.choices[0]?.delta?.content || '';
            setResponse(prev => prev + content);
          } catch (e) {
            console.error('Parse error:', e);
          }
        }
      }
    }

    setLoading(false);
  }

  return (
    <div>
      <button onClick={() => sendMessage('写一首诗')}>
        发送消息
      </button>
      <div>{response}</div>
    </div>
  );
}

Vue 示例

vue
<template>
  <div>
    <button @click="sendMessage('写一首诗')">发送消息</button>
    <div>{{ response }}</div>
  </div>
</template>

<script setup>
import { ref } from 'vue';

const response = ref('');
const loading = ref(false);

async function sendMessage(message) {
  loading.value = true;
  response.value = '';

  const res = await fetch('https://api.myopenhub.com/v1/llm/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'auto',
      messages: [{ role: 'user', content: message }],
      stream: true
    })
  });

  const reader = res.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n').filter(line => line.trim());

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') continue;

        try {
          const json = JSON.parse(data);
          const content = json.choices[0]?.delta?.content || '';
          response.value += content;
        } catch (e) {
          console.error('Parse error:', e);
        }
      }
    }
  }

  loading.value = false;
}
</script>

响应数据结构

第一个 chunk

json
{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant"
      },
      "finish_reason": null
    }
  ]
}

中间 chunk

json
{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "春天"
      },
      "finish_reason": null
    }
  ]
}

最后一个 chunk

json
{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 100,
    "total_tokens": 120
  }
}

错误处理

javascript
try {
  const stream = await client.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: '写一首诗' }],
    stream: true
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    process.stdout.write(content);
  }
} catch (error) {
  if (error.code === 'ECONNRESET') {
    console.error('连接中断,请重试');
  } else {
    console.error('错误:', error.message);
  }
}

最佳实践

1. 显示加载状态

javascript
// ✅ 显示加载状态
setLoading(true);
// ... 流式响应
setLoading(false);

2. 处理连接中断

javascript
// ✅ 重试机制
async function streamWithRetry(maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await createStream();
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await sleep(1000 * (i + 1));
    }
  }
}

3. 缓冲输出

javascript
// ✅ 缓冲输出,避免频繁更新 UI
let buffer = '';
let lastUpdate = Date.now();

for await (const chunk of stream) {
  buffer += chunk.choices[0]?.delta?.content || '';
  
  // 每 100ms 更新一次
  if (Date.now() - lastUpdate > 100) {
    setResponse(prev => prev + buffer);
    buffer = '';
    lastUpdate = Date.now();
  }
}

// 输出剩余内容
if (buffer) {
  setResponse(prev => prev + buffer);
}

4. 取消请求

javascript
// ✅ 支持取消请求
const controller = new AbortController();

fetch(url, {
  signal: controller.signal,
  // ...
});

// 取消请求
controller.abort();

性能优化

1. 使用快速模型

javascript
// 流式响应建议使用快速模型
model: 'auto-fast'  // 或 gpt-3.5-turbo, qwen-turbo

2. 限制输出长度

javascript
// 避免过长的流式输出
max_tokens: 500

3. 压缩传输

javascript
// 启用 gzip 压缩
headers: {
  'Accept-Encoding': 'gzip'
}

常见问题

Q: 流式响应会增加成本吗?

A: 不会。流式和非流式的计费方式相同,按 token 数计费。

Q: 所有模型都支持流式响应吗?

A: 是的。OpenHub 支持的所有模型都支持流式响应。

Q: 流式响应可以中途取消吗?

A: 可以。关闭连接即可取消,只会计费已生成的 token。

Q: 流式响应的延迟如何?

A: 首 token 延迟通常在 1-2 秒,后续 token 实时返回。

下一步

Released under the MIT License.