Serverless deployment guide

Short-lived serverless functions are the most common cause of missing traces. The SDK batches events and flushes them asynchronously on a timer — but if the process exits before that timer fires, buffered events are silently discarded. This guide shows the correct pattern for each major serverless platform.

Never skip calling flush() in serverless environments. The atexit/beforeExit handlers registered by the SDK are not reliably called when a Lambda or Vercel Function freezes.

The pattern

In every serverless handler, call flush() as the last operation before returning — after all LLM calls are complete, after your response is ready, before you return.

TypeScript
Python

import { zespan } from "@zespan/sdk";

// Initialize once — outside the handler, at module level
zespan.init({ apiKey: process.env.ZESPAN_API_KEY! });
const openai = zespan.wrapOpenAI(new OpenAI());

export async function handler(event: any) {
  const result = await openai.chat.completions.create({ ... });

  // Always flush before returning
  await zespan.getClient().flush();

  return { statusCode: 200, body: result.choices[0].message.content };
}

import zespan
import openai

# Initialize once — at module level, outside handler
zespan.init(api_key="zsp_your_api_key_here")
zespan.patch_openai()

client = openai.OpenAI()

def handler(event, context):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": event["prompt"]}],
    )
    result = response.choices[0].message.content

    # Always flush before returning
    zespan.get_client().flush()

    return {"statusCode": 200, "body": result}

Initialize the SDK once at module level, not inside the handler function. Module-level initialization persists across warm invocations of the same container, so you avoid the overhead of re-initializing on every request.

AWS Lambda

import { zespan } from "@zespan/sdk";
import OpenAI from "openai";

zespan.init({ apiKey: process.env.ZESPAN_API_KEY! });
const openai = zespan.wrapOpenAI(new OpenAI());

export const handler = async (event: AWSLambda.APIGatewayEvent) => {
  const body = JSON.parse(event.body ?? "{}");

  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: body.message }],
  });

  const answer = response.choices[0].message.content;

  // Flush before Lambda freezes the container
  await zespan.getClient().flush();

  return {
    statusCode: 200,
    body: JSON.stringify({ answer }),
  };
};

Lambda-specific notes:

Set a Lambda timeout of at least init timeout + max LLM latency + 2 seconds to give the flush time to complete
The SDK flush is a single HTTP request — it typically completes in under 500ms
On cold starts, the SDK initializes at module load time. This adds ~10ms and happens only once per container lifecycle

Vercel Functions (App Router)

// app/api/chat/route.ts
import { zespan, withZespanContext } from "@zespan/sdk";
import OpenAI from "openai";

zespan.init({ apiKey: process.env.ZESPAN_API_KEY! });
const openai = zespan.wrapOpenAI(new OpenAI());

export async function POST(req: Request) {
  const { message, userId } = await req.json();

  let answer: string;

  await withZespanContext({ userId }, async () => {
    const response = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [{ role: "user", content: message }],
    });
    answer = response.choices[0].message.content ?? "";
  });

  // Flush before Vercel freezes the function
  await zespan.getClient().flush();

  return Response.json({ answer: answer! });
}

Vercel-specific notes:

Set maxDuration in your vercel.json or route config to account for flush time
Vercel Edge Runtime does not support AsyncLocalStorage — use the Node.js runtime (export const runtime = "nodejs") for full trace context propagation
For streaming responses, flush after the stream is complete — the SDK captures TTFT and full token counts at stream close

Vercel Functions — streaming responses

When streaming, the flush must happen after the stream fully closes:

export async function POST(req: Request) {
  const { message } = await req.json();

  const stream = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: message }],
    stream: true,
  });

  const encoder = new TextEncoder();

  const readableStream = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const text = chunk.choices[0]?.delta?.content ?? "";
        controller.enqueue(encoder.encode(text));
      }
      controller.close();

      // Flush AFTER stream is fully consumed
      await zespan.getClient().flush();
    },
  });

  return new Response(readableStream, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

Netlify Functions

import { Handler } from "@netlify/functions";
import { zespan } from "@zespan/sdk";
import OpenAI from "openai";

zespan.init({ apiKey: process.env.ZESPAN_API_KEY! });
const openai = zespan.wrapOpenAI(new OpenAI());

export const handler: Handler = async (event) => {
  const { message } = JSON.parse(event.body ?? "{}");

  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: message }],
  });

  await zespan.getClient().flush();

  return {
    statusCode: 200,
    body: JSON.stringify({ answer: response.choices[0].message.content }),
  };
};

Google Cloud Run

Cloud Run containers can be reused across requests, making it safe to initialize at module level and flush per-request.

Node.js (Express)
Python (Flask)
Python (FastAPI)

import express from "express";
import { zespan } from "@zespan/sdk";
import OpenAI from "openai";

// Module-level — persists across requests on the same instance
zespan.init({ apiKey: process.env.ZESPAN_API_KEY!, environment: "production" });
const openai = zespan.wrapOpenAI(new OpenAI());

const app = express();
app.use(express.json());

app.post("/chat", async (req, res) => {
  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: req.body.message }],
  });

  const answer = response.choices[0].message.content;

  // Flush before Cloud Run scales the instance down
  await zespan.getClient().flush();

  res.json({ answer });
});

app.listen(process.env.PORT ?? 8080);

import zespan
import openai
from flask import Flask, request, jsonify

# Module-level — persists across requests on the same instance
zespan.init(api_key="zsp_your_api_key_here", environment="production")
zespan.patch_openai()

app = Flask(__name__)
client = openai.OpenAI()

@app.post("/chat")
def chat():
    body = request.get_json()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": body["message"]}],
    )
    answer = response.choices[0].message.content

    # Flush before Cloud Run scales the instance down
    zespan.get_client().flush()

    return jsonify({"answer": answer})

Cloud Run notes:

Cloud Run sends SIGTERM before scaling down. Call flush() per-request — don’t rely on shutdown hooks alone.
Initialize at module level so the SDK persists across warm requests on the same instance.

Reducing flush latency

If the flush adds too much latency to your handler response, consider these options: Reduce batch size so the flush completes faster (fewer events per HTTP request):

zespan.init({ apiKey: "zsp_...", batchSize: 10 });

Use a lower sample rate so fewer events are buffered:

zespan.init({ apiKey: "zsp_...", sampleRate: 0.5 }); // trace 50% of calls

Background flush with waitUntil (Vercel / Cloudflare Workers only):

// Vercel — flush in background, don't block the response
export async function POST(req: Request) {
  const answer = await callLLM(req);
  const response = Response.json({ answer });

  // Schedule flush after response is sent
  // (requires Vercel with waitUntil support)
  const ctx = (globalThis as any).__vercel_ctx;
  if (ctx?.waitUntil) {
    ctx.waitUntil(zespan.getClient().flush());
  }

  return response;
}

Overview

TypeScript SDK

Python SDK

Advanced SDK Configuration

Integrations

LLM Providers

Agent Frameworks

RAG Frameworks

Vector Databases

Custom / Other

Guides

Serverless deployment guide

The pattern

AWS Lambda

Vercel Functions (App Router)

Vercel Functions — streaming responses

Netlify Functions

Google Cloud Run

Reducing flush latency

​The pattern

​AWS Lambda

​Vercel Functions (App Router)

​Vercel Functions — streaming responses

​Netlify Functions

​Google Cloud Run

​Reducing flush latency

The pattern

AWS Lambda

Vercel Functions (App Router)

Vercel Functions — streaming responses

Netlify Functions

Google Cloud Run

Reducing flush latency