You should not need an app-store release to ship a new feature
Here is the short answer, because that is what most teams actually want to know: with a server-driven SDK, you ship new widgets, flows, and capabilities from your backend, and they appear in apps that are already installed — no new build, no app-store review, no waiting for users to update. The SDK on the device is a thin rendering shell. The backend decides what to show at runtime by sending a small JSON description of the answer plus the UI to render. Change the backend, and every installed app changes with it.
This is the model Voqal is built on. Our assistant emits a render spec on every turn: a compact JSON object describing the spoken answer, the widgets to display, and the actions a user can take. The SDK renders whatever it is told. Nothing about a new feature is baked into the binary — which means the binary almost never has to change.
If you have watched Airbnb, Spotify, Netflix, or Uber ship UI changes the same day they think of them, you have already seen the pattern. It is called server-driven UI (SDUI), and it is one of the most important architectural shifts in mobile of the last decade. Voqal takes that idea and applies it to the hardest surface to keep current: an assistant that has to render an open-ended set of answers, widgets, and confirmations.
What is a server-driven render spec?
In a traditional SDK, the UI lives in the app. Every screen, card, button, and flow is compiled into the binary. To add a new card type, you write Swift or Kotlin, submit a build, wait for review, and then wait — often weeks — for users to update. The capability of the app is frozen at the moment you shipped it.
A server-driven render spec inverts that. The server sends both the data and a description of how to present it, and the client renders it natively at runtime. As the SDUI community puts it, the client "displays it agnostic of the data it contains." The app becomes a rendering engine; the product lives on the server.
For Voqal specifically, the assistant produces a spoken answer, then a delimiter, then a JSON array of widgets — in one model call (we call it the "render tail"). The SDK parses it and renders the result. A new widget shape is a backend change, not an app release.
A small example
Here is a trimmed render spec for a single assistant turn — a balance answer with a transaction list and a confirm action:
{
"speak": "Your available balance is 42,180 pounds. Want me to send a payment link?",
"widgets": [
{
"type": "stat",
"label": "Available balance",
"value": "42,180",
"currency": "EGP"
},
{
"type": "list",
"title": "Recent transactions",
"items": [
{ "label": "Settlement", "amount": "+12,000", "date": "2026-06-12" },
{ "label": "Payment link", "amount": "+3,400", "date": "2026-06-11" }
]
},
{
"type": "confirm",
"title": "Create a payment link for 500 EGP?",
"action": "create_payment_link",
"requiresBiometric": false,
"params": { "amount": 500, "currency": "EGP" }
}
],
"follow": ["Show me today's revenue", "Settle my balance now"]
}The SDK does not know what a "stat" or a "confirm" means in business terms. It knows how to render those shapes beautifully and natively, how to read the spoken text aloud, and how to route the confirm action back through a secure execute call. Everything domain-specific — the wording, the widgets chosen, the action attached — is decided server-side, per request, at runtime.
Why this matters for the business, not just engineering
Server-driven UI is usually framed as an engineering convenience. It is much more than that. It changes the economics of how a product evolves.
Business benefits
- Ship features without app-store releases. New widgets, flows, and capabilities go live from the backend. Users on a build from months ago get the new behavior with zero action on their part. The SDUI playbook describes shipping "features to users running app versions from years ago" — same idea, applied to an assistant.
- No release cycle as a bottleneck. Deployment velocity moves from weeks to hours. A copy fix, a new card type, or a smarter flow does not wait on Apple or Google review.
- A/B test the experience instantly. Serve a different render spec to different user segments and measure which converts. This is exactly how large SDUI shops run hundreds of experiments at once without cutting new app versions.
- One integration that evolves forever. Your team integrates the Voqal SDK once. Every future capability arrives over the wire. The integration you ship today keeps getting more valuable without your engineers touching it again.
- Cross-platform consistency for free. Because the product logic lives server-side, iOS, Android, and web render the same spec the same way. No more "the new flow shipped on iOS first."
- Personalization at runtime. The backend tailors the spoken answer and the widgets to the user, their country, their data, and their context — per request — instead of branching logic frozen in an old binary.
- Faster incident response. If a widget or flow misbehaves, you fix it server-side and it is corrected everywhere instantly. You are not waiting on an emergency app release.
There is a strategic angle too. When your assistant's capabilities are defined by a render spec rather than compiled UI, the same SDK serves every tenant and every use case. A payments merchant, a logistics app, and a healthcare portal can all drop in the identical shell and get a completely different assistant, because the difference lives entirely on the server. That is what lets Voqal behave like infrastructure — "Stripe for assistant UI" — rather than a bespoke build per customer.
How Voqal applies SDUI to a voice-first assistant
Most SDUI systems render static screens: a home feed, a product page, a promo banner. An assistant is harder, because the set of possible answers is effectively unbounded. You cannot pre-build a screen for every question a user might ask. That is precisely why the render-spec model fits voice so well.
When a user speaks, the request travels to the Voqal engine, which runs the agent against your connected tools and data, then emits the spoken answer plus the widget array. The SDK speaks the answer, renders the widgets, and — for actions like a payment or a settlement — routes a confirmation back through a secure execute call gated by biometrics where required. The device never needs to know in advance what kinds of answers exist. It only needs to know how to render the handful of widget shapes well.
This is also why a voice-first assistant is a natural fit for SDUI rather than a forced one. Voice is inherently open-ended, and the interface has to be assembled on the fly. If you want the deeper background on why voice is becoming a primary interface, see why voice-first is the next platform shift, and for the category itself, what a voice-to-actions SDK actually is.
The honest tradeoffs
SDUI is not free. The community is clear-eyed about this, and so are we:
- The client carries rendering cost. Parsing JSON and building native views at runtime uses CPU and memory. On low-end devices a naive implementation can feel janky. The fix is a tight, well-optimized renderer and a small, deliberate widget vocabulary — not an arbitrary UI tree.
- A constrained spec beats an infinite one. Spotify's early generic SDUI framework grew so flexible that engineers described debugging it as "archeology," and it was eventually deprecated. The lesson: keep the spec small and opinionated. Voqal exposes a curated set of widget types, not a general-purpose layout language.
- App-store compliance is about code, not data. Sending JSON that renders native UI is fully within the rules — the stores block unreviewed executable code, not server-supplied content. SDUI is how the largest apps in the world ship daily, compliantly.
Getting these tradeoffs right is most of the work, and it is the part you do not want to build yourself. If you are weighing whether to assemble this in-house, the build-vs-buy cost breakdown for an in-house voice assistant lays out where the hidden time goes.
What this means for your integration
The practical payoff is simple. You integrate the SDK once. From that point on, your assistant's vocabulary of answers, widgets, and actions grows on the server — and every growth ships to every installed app, instantly, without a release. Your roadmap stops being gated by app-store timelines, and your single integration keeps compounding in value.
That is the whole promise of server-driven UI, applied to the surface that needs it most. The companies that adopted SDUI early did not just ship faster — they changed what "shipping" means. Voqal brings that same shift to voice and assistant experiences, behind one thin SDK.
If Arabic and right-to-left rendering are part of your market, the render-spec model handles that server-side too — see the complete guide to an Arabic voice SDK. When you are ready to wire it up, the integration docs walk through the SDK and the contract, or join the waitlist to get started.
Frequently asked questions
Does server-driven UI violate App Store or Play Store rules?
No. The stores prohibit downloading and executing unreviewed code, not server-supplied data. A render spec is JSON that the native SDK interprets to build native UI — the same mechanism Airbnb, Spotify, Netflix, and Uber use to ship updates daily. It is fully compliant.
How is a render spec different from just calling a normal API?
A normal API returns data and leaves the client to decide how to present it, which means presentation logic is compiled into the app. A render spec returns the data and a description of how to present it, so the presentation decision moves to the server. New ways of presenting things ship without an app update.
What happens to users on old versions of the app?
They get the new behavior automatically, as long as the new widgets use shapes their installed SDK already knows how to render. Because the SDK is a thin renderer over a small, stable widget vocabulary, the server can deliver new flows and content to builds that are months or years old.
Is dynamic rendering slow on lower-end phones?
It can be if the spec is an unbounded UI tree. Voqal avoids that by keeping the widget vocabulary small and the renderer tightly optimized, so views are built quickly and natively. The goal is a constrained, opinionated spec — not a general-purpose layout engine that has to be parsed from scratch every turn.
Can I A/B test or personalize through the render spec?
Yes. Because the server decides the spec per request, you can serve different answers, widgets, or flows to different user segments and measure the results — without releasing a new app version. The same mechanism powers per-user, per-country, and per-tenant personalization at runtime.