The Horror of Audio Output

Austin Hicks

2014-12-09 20:30

Update: I didn't realize there was a clarity issue, but the Hacker News thread made it clear that I should provide this clarification. I knew from the beginning of development that, no matter what I did, I'd need to support multiple backends. I had already written something like a third of the code that I would have to write by taking this route and, at some point, you just have to do it in the interest of actually moving forward. I am also developing a commercial library. To this end, some options that would otherwise be options are closed to me; I cannot use any dependency that my end users may need to purchase nor may I entertain the option of bringing in monolithic app development frameworks. Anyhow, back to your regularly scheduled blog post...

I've not been blogging recently because of two factors. The first of these is that we still do not have an accessible solution by which I can include math equations, save for me instructing you to learn LaTeX and view the source code of my articles; It would be incredibly hypocritical of me to make my blog posts inaccessible. The second is that the things I have been working on are somewhere between the domain of programming and math textbooks, with a bit of art and a lot of trial and error thrown in, so I simply haven't had anything interesting to say that fits into a blog post.

the latter of these has fixed itself and I expect I shall actually be posting semi-regularly again.

Today I want to talk about the horror of the world of audio output and all the wonderful broken promises you run into when you try. For a very long time, this was the hardest part of Libaudioverse, and I have had to literally give up on software libraries and at least one nice feature. I'm going to talk about this primarily in the form of an enumerated list of all the things I've had to try, talking about what's wrong with each one; if you're not interested in a post about various audio libraries and their brokenness, you can stop now. This is long, but I think it is worth chronicling so that others do not fall into the same pitfalls, something which has both frustrated me greatly and wasted weeks of Libaudioverse development time.

What Would be Ideal?

In an ideal world, I'd write the audio code once. It would work on all platforms, would always detect things correctly, not require the user to set up anything, and provide standards that we could follow. GPL software is not ideal for my use case, eliminating a few options that probably aren't good options anyway.

To list what I'm aware of as options, we have (links go to homepages) Sdl, PortAudio, Cubeb, RtAudio, OpenAL Soft, and writing backends by hand for all platforms.

Here's why I ended up taking the last option, one library at a time. These are roughly in chronological order. If you still don't get the theme, these all failed to one degree or another. In some cases, I could have lived with it for any other type of software, but the whole point of Libaudioverse is high-quality audio and none of these measure up.

Sdl, The Popular gaming Library

Sdl was eliminated without me even trying it for a very simple reason. If you want to use Sdl, any app that uses you needs to also use Sdl. In addition, the version of Sdl used by your app needs to be the same as that used by your library. You have to make sure that it's not initialized twice, and you're probably going to be passing DLL handles around or something of that ilk in order to make sure everything plays nicely. One specific thing that stopped me but won't affect others is that Sdl does not work with whole program optimizations in Microsoft's compilers, something that gives Libaudioverse a pretty major performance boost.

Sdl could work fine for an application, but not for a library. I cannot comment on the reliability of Sdl or otherwise because I barely used it in the very earliest Libaudioverse prototypes. If you are a library developer, stay away; pulling in Sdl is likely to be complicated and potentially very messy.

PortAudio, The Most Common Cross-platform Audio Output Library

This one is probably the most disappointing in a way. PortAudio is used by a lot of software. In a way, it's the standard for cross-platform audio, and is one of the first things one finds when Googling the subject.

PortAudio is incredibly latent on Windows. This is because the Winmm backend is the only backend that works reliably. I was able to use PortAudio, but only by specifically turning off all of the good alternatives; some of them do not compile on modern compilers and the rest were simply broken. Ironically, it turns out that Winmm itself is fine, as this is what Libaudioverse uses now. But the additional overhead of whatever PortAudio is doing internally kills it. Like many projects of extremely long duration, actually fixing PortAudio is difficult, probably more difficult than just starting over; I was unable to comprehend the code.

The second is that PortAudio fails for anything more than two channel audio. because platforms use different standards for surround sound channel ordering, it is required that they be swapped according to the API in use. PortAudio requires that you use some tricks with void pointers and platform-specific code to even enable higher channel counts. On top of this, it does not perform the needed swapping, at least not that I could find. Given the proceeding point, this basically makes PortAudio useless save for the most undemanding applications. I suspect it's popularity stems from the fact that very few applications actually need more than stereo, and latency doesn't matter save in games and audio editing and synthesis programs.

Cubeb: Surely Mozilla's Option Is Good Enough?

Given the theme so far, I'm sure you have figured out that the answer is no.

Cubeb is part of Firefox. It is cross-platform and is one of the only options on this list that includes iOS and Android out of the box. When you're listening to Youtube, Firefox is most likely using it. The only downside is that it doesn't support device indexing, a feature which is actually lacking in Firefox as well. Sounds perfect, and surely I can patch that in.

It doesn't compile.

See, it's internal to Mozilla. Mozilla is running a custom build system. But they want to allow developers to develop on it without pulling down the whole Mozilla source tree, a process that takes a couple hours. there are two solutions to this problem, and I must say that I am greatly disappointed that they used the second.

The right way to do what they want is to make the library entirely separate. Cubeb does not depend on anything else from Mozilla. It could have easily been pulled out completely, with a script to vendor whatever specific version Mozilla is using through Mozilla's custom build system. Do so and development is completely separate and it probably would have seen much wider use.

But they used option 2. There are some bash scripts that manually sync the two repositories. People develop on both, and manually put them back together whenever someone feels like it. Any coder who is actually familiar with version control and open source and whatever probably knows where I am going next.

See, when you do that, files get lost and there are unspoken and invisible rules you can't break because it has to keep working in the other version. In this case, it means compiling inside Mozilla, and that makes it somewhat unclear as to what can't be changed. After fiddling with it for a while, I finally cloned the whole Mozilla source tree, copied things that were missing out, and began conquering the fact that a cross-platform library decided to use automake as it's build system.

After a week of wondering why things over here were weird and things over there were weird in the opposite way, patching it to work with more compilers than Firefox compiles on, building a CMake build system for it, etc. I gave up and left. This is not in a state where it can actually be used outside Firefox, despite having a Github.

Of all of these, this one irritated me the most. I would expect better practices from Mozilla for a situation like this, but maybe there are factors I don't know about in play. Whatever. Time to move on to...

RtAudio

This is the prime example of why you should never, ever put your entire library in a single CPP file. Ever. I don't care if this makes it easy to distribute. If someone actually has to fix it, this just complicates things tremendously.

And the bug? More broken backends. RtAudio professes to do a lot of resampling and such so that you don't have to worry about matching formats, claims to support WASAPI, etc. Except that apparently some of the backends don't quite live up to that. A giant single sorce file that has everything is not the place to go digging for stuff when you don't actually know what you're looking for. I did eventually find the code responsible for breaking WASAPI, and it sort of worked with things turned off as I recall, but I couldn't follow it well enough to make a fix. I promise that if you make your library in this manner, you won't get many pull requests; it's simply too hard to deal with code that is organized by comments and not file names, especially if you don't have a table of contents at the top. And in truth it doesn't make dependency management easier, anyway, save perhaps for classes of new programmers. And maybe McGill University wanted it for classes of new programmers, I don't know.

OpenAL Soft

I've ranted about OpenAL before, so it may be surprising to some that this made my list. The irony is that OpenAL sucks as a library for doing 3D audio, but it is available on all platforms in one form or another and, if you only allow OpenAL to be in one file of your project, mutex hell also remains in only one file of your project too. It works on iOS and Android, and Libaudioverse currently contains a fallback backend using it.

So why am I not shouting to the heavens? because you have to basically use OpenAL Soft on the desktop platforms. You have to query and turn on extensions to make it work with nonstandard formats (hint: anything above stereo 16-bit audio is nonstandard). If you assume that it's OpenAL Soft you lose all the benefit because now it's not going to work on the phone, and at least one of the extensions is a nice pitfall. Which is why it's a problem.

See, OpenAL lets you query for devices, but doesn't say anything else about them. And if you don't care, i.e. you're using OpenAL as your 3D mixer, that's fine. OpenAL Soft realized that sometimes you don't want spatialization effects on your sources and implemented an extension to let you connect directly to the sound card. Here's the problem: try to send channels that don't exist on the card and you don't get errors of any sort. They're just silently dropped. You cannot query to find out if this is happening. Whether or not the user tells you also doesn't matter because, in at least one case in the real world, different OpenAL Soft backends see the same device as having either stereo or 7.1 surround sound. For this to work right, users will almost certainly need to edit an ini file. Good luck walking Windows users through that, though admittedly mac and Linux users probably wouldn't have a problem with it.

yet it is true that OpenAL Soft is the most free of mysterious bugs. The ones it has can be worked around reliably and don't tend to appear and disappear randomly. It makes a nice fallback, and getting it working on iOS and Android where the configurations are pretty fixed wouldn't be a problem. It wasn't exactly trivial to write a backend for, but it wasn't hard either. Should all the other backends fail, you're at least pretty darn likely to get audio out of it, so it makes a nice fallback. But it's not industrial-strength, either.

So The Conclusion: If Your App or Library Depends on Audio, Take the Time to Write Your Own

This is what I'm doing now, and the irony is that it's actually not so bad. When you cut out the library middle-man, you can get some pretty nice latency out of even Winmm, and the irony is that Winmm is actually not much harder than PortAudio. Literally the hardest part is what I think of as negotiation: you typically have to figure out what format a device supports by asking it all the formats you support and seeing which ones it returns true for.

Even this isn't all happy because it turns out that your OS sometimes does nice things like, for example, saying it supports 8-channel 7.1 surround sound even on stereo speakers. The OS then does the downmixing for you, which brings us to the feature I can't have In Libaudioverse: properly detecting the audio system configuration. You must ask the user for a preference. There is no reliable way to detect the required audio information without getting it wrong half the time. The one thing you can rely on is that stereo output is going to work. But nothing else.

The only nice thing is that Libaudioverse is now actually almost two libraries, one that handles 3D audio simulation and one that handles output. I am tempted to split them; the reason I haven't is because continuing to develop the piece that does audio output does not actually make this split significantly harder, and it's easier to have them in the same place for the moment.

I have failed to find anything that does the trick in all circumstances, up to and including abandoning libraries, a fact which I find incredibly depressing. This has been a problem for at least 20 years. If you had asked me if this was worth working on separately before I had started Libaudioverse, I'd have laughed. But it turns out that it is, and I'm now debating if a good solution for it is sellable. A large part of me says that it is. I find this to be problematic. Sure, it's a way to make money, but this is not actually a significantly hard problem in the science sense. I cannot fathom how it has not yet been solved. There are many, many smart programmers; surely one of them has been bitten by this and made it his or her holy programmer mission to go fix it? Either no one has, or everyone has failed. I find either case quite depressingly sad and perhaps something will fall from the sky. I doubt it very much.