Context Summarization

Overview

In long-running voice AI conversations, context grows with every exchange. This increases token usage, raises costs, and can eventually hit context window limits. Pipecat includes built-in context summarization that automatically compresses older conversation history while preserving recent messages and important context.

How It Works

Context summarization automatically triggers when either condition is met:

Token limit reached: Context size exceeds max_context_tokens (estimated using ~4 characters per token)
Message count reached: Number of new messages exceeds max_unsummarized_messages

When triggered, the system:

Sends a LLMContextSummaryRequestFrame to the LLM service
The LLM generates a concise summary of older messages
Context is reconstructed as: [system_message] + [summary] + [recent_messages]
Incomplete function call sequences and recent messages are preserved

Context summarization is asynchronous and happens in the background without blocking the pipeline. The system uses request IDs to match summary requests with results and handles interruptions gracefully.

Enabling Context Summarization

Enable summarization by setting enable_context_summarization=True in LLMAssistantAggregatorParams:

from pipecat.processors.aggregators.llm_response_universal import (
    LLMAssistantAggregatorParams,
    LLMContextAggregatorPair,
)

# Create aggregators with summarization enabled
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    assistant_params=LLMAssistantAggregatorParams(
        enable_context_summarization=True,
    ),
)

With the default configuration, summarization triggers at 8000 estimated tokens or after 20 new messages, whichever comes first.

Customizing Behavior

Use LLMContextSummarizationConfig to control when and how summarization occurs:

from pipecat.utils.context.llm_context_summarization import LLMContextSummarizationConfig

user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    assistant_params=LLMAssistantAggregatorParams(
        enable_context_summarization=True,
        context_summarization_config=LLMContextSummarizationConfig(
            max_context_tokens=8000,                                # Trigger at 8000 tokens
            target_context_tokens=6000,                             # Target summary size
            max_unsummarized_messages=20,                           # Or trigger after 20 new messages
            min_messages_after_summary=4,                           # Keep last 4 messages uncompressed
            summarization_prompt=None,                              # Custom prompt (optional)
            summary_message_template="Conversation summary: {summary}",  # Format template
            llm=None,                                               # Optional dedicated LLM
            summarization_timeout=120.0,                            # Timeout in seconds
        ),
    ),
)

Configuration parameters:

Parameter	Default	Description
`max_context_tokens`	8000	Maximum context size (in estimated tokens) before triggering summarization
`target_context_tokens`	6000	Target token count for the generated summary
`max_unsummarized_messages`	20	Maximum new messages before triggering summarization
`min_messages_after_summary`	4	Number of recent messages to preserve uncompressed
`summarization_prompt`	None	Custom prompt for summary generation (uses built-in default if None)
`summary_message_template`	`"Conversation summary: {summary}"`	Template for formatting the summary when injected into context
`llm`	None	Optional separate LLM service for generating summaries (uses pipeline LLM if None)
`summarization_timeout`	120.0	Maximum time in seconds to wait for summary generation

What Gets Preserved

Context summarization intelligently preserves:

System messages: The first system message (defining assistant behavior) is always kept
Recent messages: The last N messages stay uncompressed (configured by min_messages_after_summary)
Function call sequences: Incomplete function call/result pairs are not split during summarization

Custom Summarization Prompts

You can override the default summarization prompt to control how the LLM generates summaries:

custom_prompt = """Summarize this conversation concisely.
Focus on: key decisions, user preferences, and action items.
Keep the summary under {target_tokens} tokens."""

config = LLMContextSummarizationConfig(
    summarization_prompt=custom_prompt,
)

When no custom prompt is provided, Pipecat uses a built-in prompt that instructs the LLM to create a concise summary preserving key information, user preferences, and conversation flow.

Using a Dedicated LLM for Summarization

For cost optimization, you can route summarization requests to a separate, cheaper/faster LLM while keeping your primary model for conversation:

from pipecat.services.google import GoogleLLMService
from pipecat.services.openai.llm import OpenAILLMService

# Primary LLM for conversation (expensive, high-quality)
llm = OpenAILLMService(
    model="gpt-4o",
    api_key=os.getenv("OPENAI_API_KEY"),
)

# Dedicated LLM for summarization (cheap, fast)
summarization_llm = GoogleLLMService(
    model="gemini-1.5-flash",
    api_key=os.getenv("GOOGLE_API_KEY"),
)

user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    assistant_params=LLMAssistantAggregatorParams(
        enable_context_summarization=True,
        context_summarization_config=LLMContextSummarizationConfig(
            llm=summarization_llm,  # Use dedicated LLM for summaries
        ),
    ),
)

This approach reduces costs in long-running sessions where summarization happens frequently.

Customizing Summary Format

Use summary_message_template to control how summaries are formatted when injected into context. This is useful for wrapping summaries in custom delimiters (e.g., XML tags) so system prompts can distinguish them from live conversation:

config = LLMContextSummarizationConfig(
    summary_message_template="<summary>{summary}</summary>",
)

The template must contain {summary} as a placeholder for the generated summary text.

Monitoring Summarization

Use the on_summary_applied event handler to track summarization activity and observe compression metrics:

from pipecat.processors.aggregators.llm_context_summarizer import SummaryAppliedEvent

# Access the summarizer from the assistant aggregator
summarizer = assistant_aggregator._context_summarizer

@summarizer.event_handler("on_summary_applied")
async def on_summary_applied(summarizer, event: SummaryAppliedEvent):
    logger.info(
        f"Context compressed: {event.original_message_count} → {event.new_message_count} messages "
        f"({event.summarized_message_count} summarized, {event.preserved_message_count} preserved)"
    )

This event fires after each successful summarization and provides:

original_message_count: Total messages before summarization
new_message_count: Total messages after summarization
summarized_message_count: Number of messages compressed into the summary
preserved_message_count: Number of recent messages preserved uncompressed

Next Steps

Context Management

Learn how Pipecat manages conversation context in pipelines.

Saving Transcripts

Collect and save conversation transcripts for analysis.

Learning Pipecat

Fundamentals

Features

Telephony

Context Summarization

Overview

How It Works

Enabling Context Summarization

Customizing Behavior

What Gets Preserved

Custom Summarization Prompts

Using a Dedicated LLM for Summarization

Customizing Summary Format

Monitoring Summarization

Next Steps

Context Management

Saving Transcripts

Learning Pipecat

Fundamentals

Features

Telephony

​Overview

​How It Works

​Enabling Context Summarization

​Customizing Behavior

​What Gets Preserved

​Custom Summarization Prompts

​Using a Dedicated LLM for Summarization

​Customizing Summary Format

​Monitoring Summarization

​Next Steps

Context Management

Saving Transcripts

Overview

How It Works

Enabling Context Summarization

Customizing Behavior

What Gets Preserved

Custom Summarization Prompts

Using a Dedicated LLM for Summarization

Customizing Summary Format

Monitoring Summarization

Next Steps