Most software development doesn’t need to touch C++.
A majority of products can be written in Swift, Kotlin, JavaScript, or any number of modern languages that are easier to work with. Entire categories of apps—social networks, marketplaces, streaming platforms—run perfectly well without ever touching low-level code.
Most apps are built around discrete actions—tap a button, get a response. But real-time media and voice apps don’t work that way. Audio is constantly flowing, multiple streams are active at once, and the system has to react instantly—mixing, prioritizing, and transforming signals as they come in.
That’s where things start to break if the underlying system can’t keep up. And that’s typically when C++ becomes part of the solution.
Where the Performance Difference Actually Comes From
At a high level, C++ gives you more direct control over how your program runs. Code compiles down very close to machine instructions, memory is managed explicitly, and there’s no background system stepping in unpredictably to clean things up or move data around.
Languages like Swift and Kotlin are also compiled, and very fast in general. But they’re designed with different priorities: safety, developer productivity, and managing memory automatically. This is usually incredibly useful but it also means that you don’t have precise control over exactly when things happen.
For most applications, that distinction doesn’t matter. But in systems where work has to happen on a strict schedule—every few milliseconds, without interruption—it starts to matter a lot.
Audio is one of the clearest examples.
Why Spotify Doesn’t Need C++
Take a typical streaming app like Spotify or Netflix.
From the outside, these feel like real-time systems. You press play, and audio or video starts instantly. But under the hood, they’re mostly orchestrating playback of pre-encoded media using highly optimized system frameworks like AVFoundation on iOS or ExoPlayer on Android.
Those frameworks already solve the hard problems. They’re written in low-level code, tuned for each device, and designed to handle buffering, decoding, and synchronization. The app itself is mostly issuing high-level commands: play, pause, seek.
As long as you’re just playing audio or video, Swift and Kotlin are more than sufficient. You’re not responsible for the real-time engine—you’re just running it on autopilot.
When You Stop Playing Audio and Start Shaping It
The situation changes the moment you start manipulating audio in real time.
Imagine a simple guitar app. You plug in, play a note, and expect to hear it instantly with effects applied—distortion, reverb, delay. If there’s even a small hiccup, you hear it immediately as a click, a pop, or a lag between action and sound. The system isn’t just responding to input anymore; it’s continuously transforming a live signal under tight timing constraints.
That same pattern shows up in professional audio tools, where thousands of samples are processed in rapid succession, and timing errors are not just noticeable—they’re unacceptable. At that point, you need guarantees about when code runs, how memory is handled, and how data flows through the system.
That’s where C++ tends to become the foundation. And in music production and other pro audio use cases, it has been the foundation for years.
But we are now seeing new classes of real time audio applications come to market and expect that trend to continue as real time audio technologies make new use cases feasible.
Interactive Voice Chat Gets Complicated Quickly
Consider something that looks simple on the surface: a group voice chat.
But instead of just talking, participants start watching a video together—a shared media stream, synced across devices. While the video plays, people speak intermittently. When someone starts talking, the system detects speech using voice activity detection (VAD), lowers the volume of the video (ducking) and this makes it easier to hear the person talking. When they stop talking, the video audio smoothly returns to full volume.
Behind the scenes, this means:
Multiple audio streams (video, multiple microphones)
Real-time analysis (VAD running continuously)
Dynamic mixing and gain adjustment
Tight synchronization between audio and video
All of this has to happen continuously, with minimal latency, across devices with different hardware characteristics.
This is no longer a simple “playback” problem. It’s a real-time orchestration problem. And small timing inconsistencies—whether from memory management, scheduling, or buffering—start to compound into bugs and audio glitches that you can’t diagnose or control from high level languages.
Real-Time Voice AI Raises the Stakes
Now layer in voice AI, such as a voice agent participating in the voice or video call.
Instead of just passing audio around and mixing it, you’re processing it through a pipeline: capturing microphone input, detecting speech, transcribing it, generating a response, and synthesizing audio back to the user. And you’re doing it continuously, not as a one-off request.
Increasingly, parts of this pipeline are moving onto the device:
Running VAD locally to avoid sending silence to the cloud
Performing lightweight speech recognition on-device for latency or privacy
Handling audio pre-processing (noise suppression, filtering) before anything leaves the device
This hybrid model—some local, some cloud—is becoming the default. It reduces latency, lowers cost, and improves reliability when connectivity is unstable.
But it also means the device itself is now responsible for orchestrating a real-time graph of audio processing steps. Audio is being routed, transformed, and conditionally processed in-flight. Buffers need to stay aligned. Timing needs to be predictable. There’s no room for pauses or jitter.
That’s where a C++ core becomes less of an optimization and more of a requirement. It’s what allows you to build a system that behaves consistently under real-world conditions.
Live Streaming, AI, and Interaction
Push this one step further.
Imagine a live commerce stream—something like Whatnot or eBay Live, but a more interactive version of it. For example, envision a host presenting products live. Viewers are watching in real time. Some can jump in with voice questions. An AI co-host listens to the conversation, surfaces relevant product details, and occasionally speaks up. Background music plays softly under the stream, adjusting dynamically depending on who’s speaking.
In that environment, you’re dealing with:
A primary audio/video stream from the host
Audience voice input, potentially from multiple users
An AI-generated voice stream
Background audio that needs to be mixed and ducked appropriately
Everything is live. Everything is interactive. And everything needs to feel instantaneous.
At that point, the system starts to look less like an app and more like a real-time media engine—one that has to make continuous decisions about routing, mixing, and prioritization. And if the system tries to do all of this in the cloud, it will be laggy, expensive, and difficult to scale.
Why This Matters More Going Forward
For a long time, most apps were built around discrete interactions. You tapped a button, something happened, and the system returned to idle.
That’s changing.
Voice interfaces, AI agents, live collaboration, and shared media experiences are pushing products toward continuous interaction. Instead of isolated requests, systems are increasingly expected to stay active, responsive, and context-aware over long periods of time.
As technology makes more things possible, user expectations grow.
As that shift happens, more applications need to develop real-time systems. And real-time systems depend on being able to precisely control how data flows and how processing happens over time.
That’s exactly the kind of flexibility that low-level audio engines—typically built in C++—are designed to provide.
Where Switchboard Fits
Switchboard bridges a gap. It provides the control of a custom C++ audio engine without ever needing to write C++. .
Under the hood, it uses a C++ audio engine capable of handling real-time processing, multi-stream routing, and tight timing constraints. The kinds of systems you’d normally only attempt if you were willing to build and maintain a complex native engine yourself.
But instead of exposing all of that complexity directly, Switchboard lets you define how audio should flow—how streams are connected, how processing is applied, how decisions are made—using higher-level abstractions. Those definitions can come from JSON, Swift, Kotlin, JavaScript, or cross-platform frameworks like React Native and Flutter. We are also building a no-code Editor.
So you end up with vastly simplified application development while still benefiting from the determinism and performance of a C++ core.
It’s not about forcing every team to write C++. It’s about making it possible to build systems that would normally require it, without taking on the full cost of doing so.
The Bottom Line
You don’t need C++ to build most applications.
But if your product starts to depend on continuous audio processing, tight latency, multiple interacting streams, or real-time AI, you’re no longer just building an app—you’re building a system that runs on a clock.
And at that point, the ability to control exactly what happens, exactly when it happens, becomes the difference between something that works in a demo and something that holds up in the real world.