Manual spans — tracing custom operations

The automatic wrappers handle LLM providers, but many real-world workflows involve operations that are not direct LLM calls — document retrieval, embedding pipelines, custom model APIs, and post-generation evaluation. startSpan (TypeScript) / start_span (Python) let you create a custom span for any of these, with the same trace context propagation as wrapped LLM calls.

When to use manual spans

Use startSpan/start_span when you want to trace:

RAG pipelines — measure retrieval latency and track retrieved context separately from the LLM call
Custom model wrappers — a self-hosted model or a provider not covered by a built-in wrapper
Evaluation harnesses — attach faithfulness, relevance, or custom metric scores to a span
Multi-step workflows — break a complex pipeline into named segments for easier debugging

Basic usage

TypeScript: startSpan returns an object with a span handle and a run helper. Call await span.end() in a catch/finally path to guarantee the span is always closed. Python: start_span returns a ManualSpan object directly — there’s no separate span/run pair to destructure. Call span.end() (synchronous, no await) in your except block.

import { zespan, startSpan } from "@zespan/sdk";

zespan.init({ apiKey: process.env.ZESPAN_API_KEY! });

async function runRagPipeline(query: string): Promise<string> {
  const { span } = startSpan({
    name: "rag-pipeline",
    provider: "custom",
  });

  try {
    const docs = await retrieveDocuments(query);
    const answer = await generateAnswer(query, docs);

    // Attach an evaluation score before closing the span
    span.setEvalScore("faithfulness", 0.92);
    span.setEvalScore("relevance", 0.85);

    await span.end({
      status: "success",
      input_tokens: 350,
      output_tokens: 120,
      cost_usd: 0.0042,
    });

    return answer;
  } catch (err) {
    await span.end({
      status: "error",
      error_message: String(err),
    });
    throw err; // Always re-throw — never swallow errors
  }
}

import os
import zespan
from zespan import start_span

zespan.init(api_key=os.environ["ZESPAN_API_KEY"])

def run_rag_pipeline(query: str) -> str:
    span = start_span(name="rag-pipeline", provider="custom")

    try:
        docs = retrieve_documents(query)
        answer = generate_answer(query, docs)

        # Attach an evaluation score before closing the span
        span.set_eval_score("faithfulness", 0.92)
        span.set_eval_score("relevance", 0.85)

        span.end(
            status="success",
            input_tokens=350,
            output_tokens=120,
            cost_usd=0.0042,
        )

        return answer
    except Exception as err:
        span.end(status="error", error_message=str(err))
        raise  # Always re-raise — never swallow errors

Always re-throw (TypeScript) or re-raise (Python) errors after calling span.end({ status: "error" }) / span.end(status="error"). Swallowing errors hides failures from your application and from the Zespan error dashboard.

`startSpan` / `start_span` options

string

required

Name of the operation. Appears as the span label in the trace flame graph. Positional/keyword argument in both SDKs.

string

default:"custom"

Model identifier, if applicable. Use the exact model string (e.g. "text-embedding-3-small") so cost estimates work correctly.

string

default:"custom"

Provider name, e.g. "openai", "anthropic", "custom". Used for grouping in the provider breakdown view.

string

default:"general"

Span kind hint, e.g. "llm", "tool", "agent", "retriever", "general", among other values defined by Zespan’s span-kind model.

Both SDKs always write span_kind onto the emitted event (default "general"), whether or not there’s an active agent context. Pass span_kind: "retriever" on a retrieval span so it groups correctly and lights up the RAG surfaces.

In Python, start_span(name, *, model=None, provider=None, span_kind="general") — model, provider, and span_kind are keyword-only arguments after name.

`span` methods

`span.setEvalScore(name, value)` / `span.set_eval_score(name, value)`

Attaches a named numeric score to the span. Scores appear in the evaluations tab and can be trended over time.

span.setEvalScore("faithfulness", 0.92);
span.setEvalScore("groundedness", 0.78);

span.set_eval_score("faithfulness", 0.92)
span.set_eval_score("groundedness", 0.78)

Call setEvalScore/set_eval_score any number of times before the span closes. All scores are sent together when the span ends.

`span.end(options)` / `span.end(...)`

Closes the span and enqueues the event. Must be called exactly once.

In TypeScript, span.end() is async — always await it so the enqueue completes before your function returns. In Python, span.end() is synchronous.

string

required

Outcome of the operation. Accepted values: "success", "error", "timeout", "rate_limited", "cancelled". Required in TypeScript. In Python, status defaults to "success" if omitted — but pass it explicitly on error paths, since nothing infers failure for you.

number

Number of input/prompt tokens consumed. Used for cost calculation. Defaults to 0 if omitted, in both SDKs.

number

Number of output/completion tokens generated. Defaults to 0 if omitted, in both SDKs.

number

Actual cost in USD, if known. Overrides any computed cost estimate.

string

Error description when status is "error". Truncated to 500 characters in both SDKs.

array | object | string

Retrieved documents for a retrieval/RAG span. Attaches the chunks to the span so the trace-detail Retrieval panel and the RAG evaluators can read them. See Recording retrieved documents below. TypeScript: documents on end(); Python: documents= on end().

object

Arbitrary custom attributes stored on the span as-is.

Recording retrieved documents

For a retrieval step, attach the retrieved chunks and they show up in the trace-detail Retrieval panel — and the RAG evaluators can score the retrieval itself (faithfulness, context relevance, retrieval hit rate), not just the final answer. documents accepts plain strings, plain objects ({ content, document_id?, chunk_id?, source?, score? }), or your framework’s node objects directly — LangChain Document (its pageContent is read) and LlamaIndex NodeWithScore are both understood, so you can pass a retriever’s output straight through. Chunk text is redacted with the same rules as prompts and dropped entirely when storePrompts: false — ids, source, and score are always kept.

`recordRetrieval` / `record_retrieval` — one shot

The simplest form: you’ve already run a vector search or reranker and just want the chunks on the trace. This emits a retriever span on its own.

import { zespan, recordRetrieval } from "@zespan/sdk";

const docs = await vectorStore.search(query, 5); // strings, objects, or LangChain/LlamaIndex nodes
await recordRetrieval(docs, { query });

const answer = await generate(query, docs);

import zespan
from zespan import record_retrieval

docs = vector_store.search(query, k=5)  # strings, dicts, or LangChain/LlamaIndex nodes
record_retrieval(docs, query=query)

answer = generate(query, docs)

Attaching documents to a span you’re timing

When you want the retrieval latency measured live, create a retriever span, run the search inside it, then attach the results — either with recordDocuments/record_documents before end(), or by passing documents straight to end().

const { span } = startSpan({ name: "vector-search", span_kind: "retriever" });
try {
  const docs = await vectorStore.search(query, 5);
  await span.end({ status: "success", documents: docs });
} catch (err) {
  await span.end({ status: "error", error_message: String(err) });
  throw err;
}

span = start_span(name="vector-search", span_kind="retriever")
try:
    docs = vector_store.search(query, k=5)
    span.end(status="success", documents=docs)
except Exception as err:
    span.end(status="error", error_message=str(err))
    raise

Using LangChain or LlamaIndex through their Zespan integrations? Retrieval documents are captured automatically — you don’t need recordRetrieval. This helper is for manual RAG built on raw provider calls.

`recordVectorSearch` / `record_vector_search` — pgvector and other raw-SQL vector stores

Pinecone, Chroma, Weaviate, and Qdrant are auto-traced by zespan.autopatch() — no manual call needed for those. pgvector has no equivalent wrapper: it’s a Postgres extension queried through a generic driver (psycopg2/asyncpg in Python, pg in Node), and the only hook point (cursor.execute()) fires for every SQL query your app makes, not just vector ones — patching the whole driver to catch a subset of calls isn’t worth the fragility. recordVectorSearch/record_vector_search is the manual equivalent for this case — a sibling to recordRetrieval/record_retrieval — call it right after running your query.

import { zespan, recordVectorSearch } from "@zespan/sdk";

const sql = "SELECT id, content, source, embedding <-> $1 AS score FROM docs ORDER BY score LIMIT 5";
const { rows } = await pool.query(sql, [queryEmbedding]);
await recordVectorSearch(rows, { query: sql });

const answer = await generate(query, rows);

import zespan
from zespan import record_vector_search

sql = "SELECT id, content, source, embedding <-> %s AS score FROM docs ORDER BY score LIMIT 5"
cursor.execute(sql, (query_embedding,))
rows = cursor.fetchall()
zespan.record_vector_search(rows, query=sql)

answer = generate(query, rows)

rows accepts the same shapes as recordRetrieval/record_retrieval (plain strings, objects/dicts, or framework nodes) and lands in metadata.rag_contexts in the same shape the auto-traced vector-DB wrappers produce, so the trace-detail Retrieval panel and the RAG evaluators work identically regardless of which one produced the data. Row text follows storePrompts/store_prompts and is redacted like prompt text; ids/source/score are always kept. This emits a retriever span with provider defaulting to "pgvector" — pass provider to override for other manual, non-client-library vector stores.

Propagating context to nested calls

TypeScript: use the run helper returned alongside span from startSpan to run a function inside the span’s context. run accepts any function (sync or async) and returns its result. Python: ManualSpan has no separate run helper — call span.run() directly. It’s a synchronous context manager (@contextmanager, not @asynccontextmanager); use it with with span.run():. In both SDKs, wrapped LLM calls made inside the block automatically link to the enclosing span as their parent.

const { span, run } = startSpan({ name: "rag-pipeline" });

try {
  const answer = await run(async () => {
    // This OpenAI call is linked as a child of the rag-pipeline span
    return openai.chat.completions.create({
      model: "gpt-4o",
      messages: [{ role: "user", content: query }],
    });
  });

  await span.end({ status: "success" });
  return answer;
} catch (err) {
  await span.end({ status: "error", error_message: String(err) });
  throw err;
}

span = start_span(name="rag-pipeline")

try:
    with span.run():
        # This OpenAI call is linked as a child of the rag-pipeline span
        # (requires zespan.patch_openai() to have been called beforehand)
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": query}],
        )

    span.end(status="success")
    answer = response.choices[0].message.content
except Exception as err:
    span.end(status="error", error_message=str(err))
    raise

Complete RAG pipeline example

This example shows a full retrieval-augmented generation pipeline with per-phase spans and evaluation scores.

The same with_zespan_context()/withZespanContext() requirement shown below applies to the auto-traced vector-DB wrappers too — Pinecone, Chroma, Weaviate, and Qdrant. Each wrapper’s trace_id comes from whatever context is active when the call runs, exactly like a manual span. A retrieval call and a subsequent LLM call made as two independent, unwrapped top-level calls land on two different traces — RAG evaluators and the Retrieval panel then see nothing linking them, even though retrieval genuinely happened moments earlier. Wrap both in the same context block, as this example does.

import { zespan, startSpan, withZespanContext } from "@zespan/sdk";
import OpenAI from "openai";

zespan.init({ apiKey: process.env.ZESPAN_API_KEY! });
const openai = zespan.wrapOpenAI(new OpenAI());

async function handleQuery(userId: string, query: string): Promise<string> {
  return withZespanContext({ userId }, async () => {
    // Outer span for the full pipeline
    const { span: pipelineSpan, run } = startSpan({
      name: "rag-pipeline",
      provider: "custom",
    });

    try {
      // Inner span for retrieval only
      const { span: retrievalSpan } = startSpan({ name: "document-retrieval" });
      let docs: string[];
      try {
        docs = await retrieveDocuments(query);
        await retrievalSpan.end({ status: "success" });
      } catch (err) {
        await retrievalSpan.end({ status: "error", error_message: String(err) });
        throw err;
      }

      // LLM call inside the outer pipeline context
      const response = await run(() =>
        openai.chat.completions.create({
          model: "gpt-4o",
          messages: [
            { role: "system", content: `Context:\n${docs.join("\n\n")}` },
            { role: "user", content: query },
          ],
        })
      );

      const answer = response.choices[0].message.content ?? "";

      // Evaluate and score the answer
      const faithfulness = await evaluateFaithfulness(answer, docs);
      pipelineSpan.setEvalScore("faithfulness", faithfulness);

      await pipelineSpan.end({
        status: "success",
        input_tokens: response.usage?.prompt_tokens,
        output_tokens: response.usage?.completion_tokens,
      });

      return answer;
    } catch (err) {
      await pipelineSpan.end({
        status: "error",
        error_message: String(err),
      });
      throw err;
    }
  });
}

import os
import zespan
from zespan import start_span, with_zespan_context

zespan.init(api_key=os.environ["ZESPAN_API_KEY"])
zespan.patch_openai()

import openai  # import after patching
client = openai.OpenAI()

def handle_query(user_id: str, query: str) -> str:
    with with_zespan_context(user_id=user_id):
        # Outer span for the full pipeline
        pipeline_span = start_span(name="rag-pipeline", provider="custom")

        try:
            # Inner span for retrieval only
            retrieval_span = start_span(name="document-retrieval")
            try:
                docs = retrieve_documents(query)
                retrieval_span.end(status="success")
            except Exception as err:
                retrieval_span.end(status="error", error_message=str(err))
                raise

            # LLM call inside the outer pipeline context
            with pipeline_span.run():
                context_text = "\n\n".join(docs)
                response = client.chat.completions.create(
                    model="gpt-4o",
                    messages=[
                        {"role": "system", "content": f"Context:\n{context_text}"},
                        {"role": "user", "content": query},
                    ],
                )

            answer = response.choices[0].message.content or ""

            # Evaluate and score the answer
            faithfulness = evaluate_faithfulness(answer, docs)
            pipeline_span.set_eval_score("faithfulness", faithfulness)

            pipeline_span.end(
                status="success",
                input_tokens=response.usage.prompt_tokens,
                output_tokens=response.usage.completion_tokens,
            )

            return answer
        except Exception as err:
            pipeline_span.end(status="error", error_message=str(err))
            raise

Tracing & Observability

Evaluations

Datasets & Experiments

Guardrails

Prompt Management

Alerts & Incidents

Cost & Analytics

ZespanPilot (AI Copilot)

Manual spans — tracing custom operations

When to use manual spans

Basic usage

`startSpan` / `start_span` options

`span` methods

`span.setEvalScore(name, value)` / `span.set_eval_score(name, value)`

`span.end(options)` / `span.end(...)`

Recording retrieved documents

`recordRetrieval` / `record_retrieval` — one shot

Attaching documents to a span you’re timing

`recordVectorSearch` / `record_vector_search` — pgvector and other raw-SQL vector stores

Propagating context to nested calls

Complete RAG pipeline example

​When to use manual spans

​Basic usage

​startSpan / start_span options

​span methods

​span.setEvalScore(name, value) / span.set_eval_score(name, value)

​span.end(options) / span.end(...)

​Recording retrieved documents

​recordRetrieval / record_retrieval — one shot

​Attaching documents to a span you’re timing

​recordVectorSearch / record_vector_search — pgvector and other raw-SQL vector stores

​Propagating context to nested calls

​Complete RAG pipeline example

When to use manual spans

Basic usage

`startSpan` / `start_span` options

`span` methods

`span.setEvalScore(name, value)` / `span.set_eval_score(name, value)`

`span.end(options)` / `span.end(...)`

Recording retrieved documents

`recordRetrieval` / `record_retrieval` — one shot

Attaching documents to a span you’re timing

`recordVectorSearch` / `record_vector_search` — pgvector and other raw-SQL vector stores

Propagating context to nested calls

Complete RAG pipeline example