on
Lambda Response Streaming Grows Up: 200 MB Payloads and What That Means for Serverless APIs
If downloading a whole album before hearing the first note feels outdated, buffering an entire HTTP response before sending a single byte does too. That’s why response streaming in AWS Lambda has quietly become one of the most useful patterns in serverless: you can start delivering bytes to a client as soon as they’re ready—no more waiting for the full payload to be built. The big recent change? As of July 31, 2025, Lambda’s response streaming now supports up to 200 MB per response, a 10x increase over the prior limit. That upgrade expands what you can feasibly serve straight from a function without detouring through S3 or another store. (aws.amazon.com)
What changed, exactly?
- Bigger responses: Stream up to 200 MB back to clients when you use response streaming. The old “soft limit” for streaming had been 20 MB; the updated docs and announcement now state 200 MB and emphasize that this is for the streaming path, not the classic buffered 6 MB response cap. (docs.aws.amazon.com)
- How you invoke: Response streaming works via Lambda Function URLs with invoke mode set to RESPONSE_STREAM, or by calling the InvokeWithResponseStream API directly from an SDK. Buffered mode (the default) remains capped at 6 MB. (docs.aws.amazon.com)
- Performance profile: TTFB improves because you can send partial chunks immediately. After an initial uncapped burst for the first 6 MB, Lambda limits the streaming rate to a maximum of 2 MB/s. That’s usually plenty for chatty APIs and progressive rendering, but it’s worth knowing for very large payloads. (docs.aws.amazon.com)
- Runtime support: Managed Node.js runtimes are supported out of the box; other languages can participate via custom runtimes or the Runtime API. (docs.aws.amazon.com)
- Where you can (and can’t) use it: You can stream with Function URLs and CloudFront as an origin. API Gateway and Application Load Balancer don’t support progressive streaming; they’ll still buffer. If you want both custom domains and streaming, point CloudFront at your Function URL. (aws.amazon.com)
Why this matters for modern serverless
- AI and chat UIs: Token-by-token responses improve perceived speed and let you interleave UI updates while the model works. The bigger 200 MB ceiling also opens room for richer context or larger output chunks without bolting on S3. (aws.amazon.com)
- Data exports and reports: You can stream CSV/JSON rows as they arrive from a database, keeping memory usage low and getting the first bytes to the user quickly. The bandwidth cap means a 150 MB export won’t be “instant,” but it will be progressive and memory-friendly. (docs.aws.amazon.com)
- Media transforms: Think image compression or PDF assembly on the fly—stream processed bytes out as soon as a chunk is ready. (aws.amazon.com)
The upshot: more use cases can live entirely in Lambda again, reducing architectural glue and cutting trips to intermediary storage services. (aws.amazon.com)
A minimal Node.js streaming handler
Lambda exposes a handy wrapper, streamifyResponse, in Node.js runtimes. It gives your handler a writable stream you can push bytes to. The safest pattern is to use pipeline so you don’t overwhelm downstream consumers.
// index.mjs (Node.js 18+)
import { pipeline } from 'node:stream/promises';
import { Readable } from 'node:stream';
/* global awslambda */
export const handler = awslambda.streamifyResponse(async (event, responseStream, _context) => {
// Turn something into a readable stream — here, we just echo the event:
const input = Readable.from(Buffer.from(JSON.stringify(event)));
// Pipe it straight to the client; pipeline() handles backpressure for you.
await pipeline(input, responseStream);
});
This mirrors AWS’s recommended approach: the responseStream is a standard Node writable stream, and pipeline handles backpressure correctly. (docs.aws.amazon.com)
Enabling streaming on a Function URL
To stream over plain HTTPS, attach a Function URL and set its invoke mode to RESPONSE_STREAM. Here’s the relevant CloudFormation/SAM snippet:
Resources:
StreamingFunction:
Type: AWS::Lambda::Function
Properties:
Runtime: nodejs18.x
Handler: index.handler
Code: ./dist
MemorySize: 1024
Timeout: 30
StreamingFunctionUrl:
Type: AWS::Lambda::Url
Properties:
TargetFunctionArn: !Ref StreamingFunction
AuthType: AWS_IAM # Prefer IAM or CloudFront protection
InvokeMode: RESPONSE_STREAM
RESPONSE_STREAM configures the Function URL to call InvokeWithResponseStream under the hood, enabling progressive delivery and the 200 MB streaming limit. (docs.aws.amazon.com)
A simple browser client to consume the stream
Most modern HTTP clients can read streamed bodies incrementally. In the browser, use the Web Streams API:
async function readStream(url, awsSigv4Headers) {
const res = await fetch(url, { headers: awsSigv4Headers });
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffered = '';
for (;;) {
const { value, done } = await reader.read();
if (done) break;
buffered += decoder.decode(value, { stream: true });
// For Server-Sent Events, you could parse 'data:' lines here and update the UI
console.log('Chunk:', buffered.length);
}
return buffered;
}
If your HTTP client buffers until the connection closes, you won’t see the benefits; pick a client that surfaces data incrementally. (aws.amazon.com)
Production patterns that work well
- CloudFront in front of a Function URL
- Why: custom domains, caching, WAF, and TLS termination. CloudFront happily talks to a Function URL origin and still preserves streaming semantics, which is vital if you want progressive rendering with a vanity domain. (aws.amazon.com)
- VPC environments
- Gotcha: Function URLs don’t support response streaming when your function is placed inside a VPC. If you must stay in a VPC, use the AWS SDK from your client or a proxy service and call InvokeWithResponseStream via a VPC endpoint. (docs.aws.amazon.com)
- Auth and safety
- Prefer IAM auth on Function URLs, or put CloudFront in front and restrict origin access to CloudFront only. This keeps your URL from being a publicly invokable endpoint and lets you add WAF rules or token checks. (aws.amazon.com)
- Observability
- You still get logs in CloudWatch; remember that streaming is about response delivery, not changing how your code emits logs or metrics. Streaming also has pricing dimensions—check Lambda pricing for the streaming component. (docs.aws.amazon.com)
Performance notes (and a reality check)
- Time to first byte: You’ll see the first bytes arrive as soon as your function writes to the stream—no need to wait for full results. For UX, that’s gold. (docs.aws.amazon.com)
- Throughput: Expect an initial uncapped burst up to 6 MB and then up to 2 MB/s afterward. If you’re rendering a 120 MB export, you’re still talking about roughly a minute of streaming time; that’s okay for downloads and long-running AI outputs, but set expectations in the UI. (docs.aws.amazon.com)
- Memory profile: Because you don’t buffer the whole payload in memory, you can run with less configured RAM and avoid out-of-memory errors for large results. (docs.aws.amazon.com)
Where streaming fits—and where it doesn’t
Great fits:
- AI chat and RAG-style answers where you want tokens as soon as they’re generated.
- Progressive UIs: HTML or JSON streams that let you render above-the-fold content quickly, then hydrate the rest.
- Long reports, data exports, or server-side compression pipelines (e.g., gzip on the fly). (aws.amazon.com)
Not ideal:
- If you require API Gateway-specific features (authorizers, usage plans) and don’t want CloudFront. API Gateway does not support progressive chunked transfer for Lambda responses. You can still front a Function URL with API Gateway for custom domains, but you’ll lose the streaming benefit. (aws.amazon.com)
- Ultra-large, high-throughput media where 2 MB/s after the burst is a bottleneck. In those cases, consider presigned S3 downloads or a specialized media pipeline. (docs.aws.amazon.com)
Putting it together: a pragmatic recipe
1) Start with a Node.js streaming handler using streamifyResponse and pipeline. Keep each chunk meaningful (e.g., SSE lines, JSON Lines). (docs.aws.amazon.com)
2) Attach a Function URL, set InvokeMode: RESPONSE_STREAM. Test locally with curl or a browser to confirm you see incremental chunks. (docs.aws.amazon.com)
3) Put CloudFront in front for custom domains, caching, WAF, and to keep streaming behavior intact. Use Origin Access Control or other restrictions to limit direct access to the Function URL. (aws.amazon.com)
4) If your Lambda must be in a VPC, call it via InvokeWithResponseStream using the AWS SDK through a VPC endpoint—Function URLs won’t stream from inside the VPC. (docs.aws.amazon.com)
5) Monitor costs and user-perceived latency. Streaming is about faster “feels fast” more than faster total bytes delivered; make sure your UI shows progress as chunks arrive. (docs.aws.amazon.com)
The bottom line
Lambda’s bump to 200 MB for response streaming meaningfully stretches the surface area of what you can serve straight from a function. It keeps your architecture simple for AI chats, progressive rendering, and biggish downloads, without detouring through extra storage and orchestration. Know your limits (2 MB/s after 6 MB), pick the right front door (Function URL, often behind CloudFront), and lean on Node’s stream patterns to keep backpressure in check. With those pieces in place, your serverless app can “hit play” sooner—and feel snappier—without overcomplicating the stack. (aws.amazon.com)
References:
- AWS Lambda response streaming 200 MB announcement, published July 31, 2025. (aws.amazon.com)
- Response streaming docs: limits, bandwidth, and SDK invocation. (docs.aws.amazon.com)
- Function URL invoke modes and RESPONSE_STREAM. (docs.aws.amazon.com)
- Compute Blog: response streaming concepts, CloudFront origin pattern, and API Gateway/ALB caveats. (aws.amazon.com)
- Writing streaming handlers in Node.js with streamifyResponse/pipeline. (docs.aws.amazon.com)
- VPC compatibility caveats and the SDK-based workaround. (docs.aws.amazon.com)