流式响应
使用流式响应(Server-Sent Events)实时返回生成内容,提升用户体验。
启用流式响应
设置 stream: true 参数:
javascript
const response = await client.chat.completions.create({
model: 'gpt-4',
messages: [
{ role: 'user', content: '写一首关于春天的诗' }
],
stream: true // 启用流式响应
});响应格式
流式响应使用 Server-Sent Events (SSE) 格式:
data: {"choices":[{"delta":{"role":"assistant"}}]}
data: {"choices":[{"delta":{"content":"春"}}]}
data: {"choices":[{"delta":{"content":"天"}}]}
data: {"choices":[{"delta":{"content":"来"}}]}
data: {"choices":[{"delta":{"content":"了"}}]}
data: [DONE]使用示例
Python
python
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.myopenhub.com/v1"
)
stream = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "写一首诗"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end='')Node.js
javascript
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'YOUR_API_KEY',
baseURL: 'https://api.myopenhub.com/v1'
});
const stream = await client.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: '写一首诗' }],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}cURL
bash
curl https://api.myopenhub.com/v1/llm/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "写一首诗"}],
"stream": true
}'前端集成
React 示例
typescript
import { useState } from 'react';
function ChatComponent() {
const [response, setResponse] = useState('');
const [loading, setLoading] = useState(false);
async function sendMessage(message: string) {
setLoading(true);
setResponse('');
const res = await fetch('https://api.myopenhub.com/v1/llm/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'auto',
messages: [{ role: 'user', content: message }],
stream: true
})
});
const reader = res.body?.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader!.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim());
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') continue;
try {
const json = JSON.parse(data);
const content = json.choices[0]?.delta?.content || '';
setResponse(prev => prev + content);
} catch (e) {
console.error('Parse error:', e);
}
}
}
}
setLoading(false);
}
return (
<div>
<button onClick={() => sendMessage('写一首诗')}>
发送消息
</button>
<div>{response}</div>
</div>
);
}Vue 示例
vue
<template>
<div>
<button @click="sendMessage('写一首诗')">发送消息</button>
<div>{{ response }}</div>
</div>
</template>
<script setup>
import { ref } from 'vue';
const response = ref('');
const loading = ref(false);
async function sendMessage(message) {
loading.value = true;
response.value = '';
const res = await fetch('https://api.myopenhub.com/v1/llm/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'auto',
messages: [{ role: 'user', content: message }],
stream: true
})
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim());
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = line.slice(6);
if (data === '[DONE]') continue;
try {
const json = JSON.parse(data);
const content = json.choices[0]?.delta?.content || '';
response.value += content;
} catch (e) {
console.error('Parse error:', e);
}
}
}
}
loading.value = false;
}
</script>响应数据结构
第一个 chunk
json
{
"id": "chatcmpl-123",
"object": "chat.completion.chunk",
"created": 1677652288,
"model": "gpt-4",
"choices": [
{
"index": 0,
"delta": {
"role": "assistant"
},
"finish_reason": null
}
]
}中间 chunk
json
{
"id": "chatcmpl-123",
"object": "chat.completion.chunk",
"created": 1677652288,
"model": "gpt-4",
"choices": [
{
"index": 0,
"delta": {
"content": "春天"
},
"finish_reason": null
}
]
}最后一个 chunk
json
{
"id": "chatcmpl-123",
"object": "chat.completion.chunk",
"created": 1677652288,
"model": "gpt-4",
"choices": [
{
"index": 0,
"delta": {},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 100,
"total_tokens": 120
}
}错误处理
javascript
try {
const stream = await client.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: '写一首诗' }],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}
} catch (error) {
if (error.code === 'ECONNRESET') {
console.error('连接中断,请重试');
} else {
console.error('错误:', error.message);
}
}最佳实践
1. 显示加载状态
javascript
// ✅ 显示加载状态
setLoading(true);
// ... 流式响应
setLoading(false);2. 处理连接中断
javascript
// ✅ 重试机制
async function streamWithRetry(maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await createStream();
} catch (error) {
if (i === maxRetries - 1) throw error;
await sleep(1000 * (i + 1));
}
}
}3. 缓冲输出
javascript
// ✅ 缓冲输出,避免频繁更新 UI
let buffer = '';
let lastUpdate = Date.now();
for await (const chunk of stream) {
buffer += chunk.choices[0]?.delta?.content || '';
// 每 100ms 更新一次
if (Date.now() - lastUpdate > 100) {
setResponse(prev => prev + buffer);
buffer = '';
lastUpdate = Date.now();
}
}
// 输出剩余内容
if (buffer) {
setResponse(prev => prev + buffer);
}4. 取消请求
javascript
// ✅ 支持取消请求
const controller = new AbortController();
fetch(url, {
signal: controller.signal,
// ...
});
// 取消请求
controller.abort();性能优化
1. 使用快速模型
javascript
// 流式响应建议使用快速模型
model: 'auto-fast' // 或 gpt-3.5-turbo, qwen-turbo2. 限制输出长度
javascript
// 避免过长的流式输出
max_tokens: 5003. 压缩传输
javascript
// 启用 gzip 压缩
headers: {
'Accept-Encoding': 'gzip'
}常见问题
Q: 流式响应会增加成本吗?
A: 不会。流式和非流式的计费方式相同,按 token 数计费。
Q: 所有模型都支持流式响应吗?
A: 是的。OpenHub 支持的所有模型都支持流式响应。
Q: 流式响应可以中途取消吗?
A: 可以。关闭连接即可取消,只会计费已生成的 token。
Q: 流式响应的延迟如何?
A: 首 token 延迟通常在 1-2 秒,后续 token 实时返回。