Short-lived serverless functions are the most common cause of missing traces. The SDK batches events and flushes them asynchronously on a timer — but if the process exits before that timer fires, buffered events are silently discarded. This guide shows the correct pattern for each major serverless platform.
Never skip calling flush() in serverless environments. The atexit/beforeExit handlers registered by the SDK are not reliably called when a Lambda or Vercel Function freezes.
The pattern
In every serverless handler, call flush() as the last operation before returning — after all LLM calls are complete, after your response is ready, before you return.
import { zespan } from "@zespan/sdk";
// Initialize once — outside the handler, at module level
zespan.init({ apiKey: process.env.ZESPAN_API_KEY! });
const openai = zespan.wrapOpenAI(new OpenAI());
export async function handler(event: any) {
const result = await openai.chat.completions.create({ ... });
// Always flush before returning
await zespan.getClient().flush();
return { statusCode: 200, body: result.choices[0].message.content };
}
import zespan
import openai
# Initialize once — at module level, outside handler
zespan.init(api_key="zsp_your_api_key_here")
zespan.patch_openai()
client = openai.OpenAI()
def handler(event, context):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": event["prompt"]}],
)
result = response.choices[0].message.content
# Always flush before returning
zespan.flush()
return {"statusCode": 200, "body": result}
Initialize the SDK once at module level, not inside the handler function. Module-level initialization persists across warm invocations of the same container, so you avoid the overhead of re-initializing on every request.
AWS Lambda
import { zespan } from "@zespan/sdk";
import OpenAI from "openai";
zespan.init({ apiKey: process.env.ZESPAN_API_KEY! });
const openai = zespan.wrapOpenAI(new OpenAI());
export const handler = async (event: AWSLambda.APIGatewayEvent) => {
const body = JSON.parse(event.body ?? "{}");
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: body.message }],
});
const answer = response.choices[0].message.content;
// Flush before Lambda freezes the container
await zespan.getClient().flush();
return {
statusCode: 200,
body: JSON.stringify({ answer }),
};
};
Lambda-specific notes:
- Set a Lambda timeout of at least
init timeout + max LLM latency + 2 seconds to give the flush time to complete
- The SDK flush is a single HTTP request — it typically completes in under 500ms
- On cold starts, the SDK initializes at module load time. This adds ~10ms and happens only once per container lifecycle
Vercel Functions (App Router)
// app/api/chat/route.ts
import { zespan, withLumiqtraceContext } from "@zespan/sdk";
import OpenAI from "openai";
zespan.init({ apiKey: process.env.ZESPAN_API_KEY! });
const openai = zespan.wrapOpenAI(new OpenAI());
export async function POST(req: Request) {
const { message, userId } = await req.json();
let answer: string;
await withLumiqtraceContext({ userId }, async () => {
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: message }],
});
answer = response.choices[0].message.content ?? "";
});
// Flush before Vercel freezes the function
await zespan.getClient().flush();
return Response.json({ answer: answer! });
}
Vercel-specific notes:
- Set
maxDuration in your vercel.json or route config to account for flush time
- Vercel Edge Runtime does not support
AsyncLocalStorage — use the Node.js runtime (export const runtime = "nodejs") for full trace context propagation
- For streaming responses, flush after the stream is complete — the SDK captures TTFT and full token counts at stream close
Vercel Functions — streaming responses
When streaming, the flush must happen after the stream fully closes:
export async function POST(req: Request) {
const { message } = await req.json();
const stream = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: message }],
stream: true,
});
const encoder = new TextEncoder();
const readableStream = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content ?? "";
controller.enqueue(encoder.encode(text));
}
controller.close();
// Flush AFTER stream is fully consumed
await zespan.getClient().flush();
},
});
return new Response(readableStream, {
headers: { "Content-Type": "text/plain; charset=utf-8" },
});
}
Netlify Functions
import { Handler } from "@netlify/functions";
import { zespan } from "@zespan/sdk";
import OpenAI from "openai";
zespan.init({ apiKey: process.env.ZESPAN_API_KEY! });
const openai = zespan.wrapOpenAI(new OpenAI());
export const handler: Handler = async (event) => {
const { message } = JSON.parse(event.body ?? "{}");
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: message }],
});
await zespan.getClient().flush();
return {
statusCode: 200,
body: JSON.stringify({ answer: response.choices[0].message.content }),
};
};
Google Cloud Run
Cloud Run containers can be reused across requests, making it safe to initialize at module level and flush per-request.
Node.js (Express)
Python (Flask)
Python (FastAPI)
import express from "express";
import { zespan } from "@zespan/sdk";
import OpenAI from "openai";
// Module-level — persists across requests on the same instance
zespan.init({ apiKey: process.env.ZESPAN_API_KEY!, environment: "production" });
const openai = zespan.wrapOpenAI(new OpenAI());
const app = express();
app.use(express.json());
app.post("/chat", async (req, res) => {
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: req.body.message }],
});
const answer = response.choices[0].message.content;
// Flush before Cloud Run scales the instance down
await zespan.getClient().flush();
res.json({ answer });
});
app.listen(process.env.PORT ?? 8080);
import zespan
import openai
from flask import Flask, request, jsonify
# Module-level — persists across requests on the same instance
zespan.init(api_key="zsp_your_api_key_here", environment="production")
zespan.patch_openai()
app = Flask(__name__)
client = openai.OpenAI()
@app.post("/chat")
def chat():
body = request.get_json()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": body["message"]}],
)
answer = response.choices[0].message.content
# Flush before Cloud Run scales the instance down
zespan.flush()
return jsonify({"answer": answer})
Use the FastAPI middleware — it handles flush automatically on every request without per-handler calls.
Cloud Run notes:
- Cloud Run sends
SIGTERM before scaling down. Call flush() per-request — don’t rely on shutdown hooks alone.
- Initialize at module level so the SDK persists across warm requests on the same instance.
Reducing flush latency
If the flush adds too much latency to your handler response, consider these options:
Reduce batch size so the flush completes faster (fewer events per HTTP request):
zespan.init({ apiKey: "zsp_...", batchSize: 10 });
Use a lower sample rate so fewer events are buffered:
zespan.init({ apiKey: "zsp_...", sampleRate: 0.5 }); // trace 50% of calls
Background flush with waitUntil (Vercel / Cloudflare Workers only):
// Vercel — flush in background, don't block the response
export async function POST(req: Request) {
const answer = await callLLM(req);
const response = Response.json({ answer });
// Schedule flush after response is sent
// (requires Vercel with waitUntil support)
const ctx = (globalThis as any).__vercel_ctx;
if (ctx?.waitUntil) {
ctx.waitUntil(zespan.getClient().flush());
}
return response;
}