流式响应

使用流式响应（Server-Sent Events）实时返回生成内容，提升用户体验。

启用流式响应

设置 stream: true 参数：

javascript

const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'user', content: '写一首关于春天的诗' }
  ],
  stream: true  // 启用流式响应
});

响应格式

流式响应使用 Server-Sent Events (SSE) 格式：

data: {"choices":[{"delta":{"role":"assistant"}}]}

data: {"choices":[{"delta":{"content":"春"}}]}

data: {"choices":[{"delta":{"content":"天"}}]}

data: {"choices":[{"delta":{"content":"来"}}]}

data: {"choices":[{"delta":{"content":"了"}}]}

data: [DONE]

使用示例

Python

python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.myopenhub.com/v1"
)

stream = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "写一首诗"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end='')

Node.js

javascript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://api.myopenhub.com/v1'
});

const stream = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: '写一首诗' }],
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || '';
  process.stdout.write(content);
}

cURL

bash

curl https://api.myopenhub.com/v1/llm/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "写一首诗"}],
    "stream": true
  }'

前端集成

React 示例

typescript

import { useState } from 'react';

function ChatComponent() {
  const [response, setResponse] = useState('');
  const [loading, setLoading] = useState(false);

  async function sendMessage(message: string) {
    setLoading(true);
    setResponse('');

    const res = await fetch('https://api.myopenhub.com/v1/llm/chat/completions', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        model: 'auto',
        messages: [{ role: 'user', content: message }],
        stream: true
      })
    });

    const reader = res.body?.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader!.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n').filter(line => line.trim());

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = line.slice(6);
          if (data === '[DONE]') continue;

          try {
            const json = JSON.parse(data);
            const content = json.choices[0]?.delta?.content || '';
            setResponse(prev => prev + content);
          } catch (e) {
            console.error('Parse error:', e);
          }
        }
      }
    }

    setLoading(false);
  }

  return (
    <div>
      <button onClick={() => sendMessage('写一首诗')}>
        发送消息
      </button>
      <div>{response}</div>
    </div>
  );
}

Vue 示例

vue

<template>
  <div>
    <button @click="sendMessage('写一首诗')">发送消息</button>
    <div>{{ response }}</div>
  </div>
</template>

<script setup>
import { ref } from 'vue';

const response = ref('');
const loading = ref(false);

async function sendMessage(message) {
  loading.value = true;
  response.value = '';

  const res = await fetch('https://api.myopenhub.com/v1/llm/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'auto',
      messages: [{ role: 'user', content: message }],
      stream: true
    })
  });

  const reader = res.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n').filter(line => line.trim());

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') continue;

        try {
          const json = JSON.parse(data);
          const content = json.choices[0]?.delta?.content || '';
          response.value += content;
        } catch (e) {
          console.error('Parse error:', e);
        }
      }
    }
  }

  loading.value = false;
}
</script>

响应数据结构

第一个 chunk

json

{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant"
      },
      "finish_reason": null
    }
  ]
}

中间 chunk

json

{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "春天"
      },
      "finish_reason": null
    }
  ]
}

最后一个 chunk

json

{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 100,
    "total_tokens": 120
  }
}

错误处理

javascript

try {
  const stream = await client.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: '写一首诗' }],
    stream: true
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content || '';
    process.stdout.write(content);
  }
} catch (error) {
  if (error.code === 'ECONNRESET') {
    console.error('连接中断，请重试');
  } else {
    console.error('错误:', error.message);
  }
}

最佳实践

1. 显示加载状态

javascript

// ✅ 显示加载状态
setLoading(true);
// ... 流式响应
setLoading(false);

2. 处理连接中断

javascript

// ✅ 重试机制
async function streamWithRetry(maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await createStream();
    } catch (error) {
      if (i === maxRetries - 1) throw error;
      await sleep(1000 * (i + 1));
    }
  }
}

3. 缓冲输出

javascript

// ✅ 缓冲输出，避免频繁更新 UI
let buffer = '';
let lastUpdate = Date.now();

for await (const chunk of stream) {
  buffer += chunk.choices[0]?.delta?.content || '';
  
  // 每 100ms 更新一次
  if (Date.now() - lastUpdate > 100) {
    setResponse(prev => prev + buffer);
    buffer = '';
    lastUpdate = Date.now();
  }
}

// 输出剩余内容
if (buffer) {
  setResponse(prev => prev + buffer);
}

4. 取消请求

javascript

// ✅ 支持取消请求
const controller = new AbortController();

fetch(url, {
  signal: controller.signal,
  // ...
});

// 取消请求
controller.abort();

性能优化

1. 使用快速模型

javascript

// 流式响应建议使用快速模型
model: 'auto-fast'  // 或 gpt-3.5-turbo, qwen-turbo

2. 限制输出长度

javascript

// 避免过长的流式输出
max_tokens: 500

3. 压缩传输

javascript

// 启用 gzip 压缩
headers: {
  'Accept-Encoding': 'gzip'
}

常见问题

Q: 流式响应会增加成本吗？

A: 不会。流式和非流式的计费方式相同，按 token 数计费。

Q: 所有模型都支持流式响应吗？

A: 是的。OpenHub 支持的所有模型都支持流式响应。

Q: 流式响应可以中途取消吗？

A: 可以。关闭连接即可取消，只会计费已生成的 token。

Q: 流式响应的延迟如何？

A: 首 token 延迟通常在 1-2 秒，后续 token 实时返回。

下一步

聊天补全 - 了解基础 API
错误处理 - 处理流式响应错误

流式响应 ​

启用流式响应 ​

响应格式 ​

使用示例 ​

Python ​

Node.js ​

cURL ​

前端集成 ​

React 示例 ​

Vue 示例 ​

响应数据结构 ​

第一个 chunk ​

中间 chunk ​

最后一个 chunk ​

错误处理 ​

最佳实践 ​

1. 显示加载状态 ​

2. 处理连接中断 ​

3. 缓冲输出 ​

4. 取消请求 ​

性能优化 ​

1. 使用快速模型 ​

2. 限制输出长度 ​

3. 压缩传输 ​

常见问题 ​

Q: 流式响应会增加成本吗？ ​

Q: 所有模型都支持流式响应吗？ ​

Q: 流式响应可以中途取消吗？ ​

Q: 流式响应的延迟如何？ ​

下一步 ​

流式响应

启用流式响应

响应格式

使用示例

Python

Node.js

cURL

前端集成

React 示例

Vue 示例

响应数据结构

第一个 chunk

中间 chunk

最后一个 chunk

错误处理

最佳实践

1. 显示加载状态

2. 处理连接中断

3. 缓冲输出

4. 取消请求

性能优化

1. 使用快速模型

2. 限制输出长度

3. 压缩传输

常见问题

Q: 流式响应会增加成本吗？

Q: 所有模型都支持流式响应吗？

Q: 流式响应可以中途取消吗？

Q: 流式响应的延迟如何？

下一步