It works, but it seems that loading is very slow...
Debug: 98 seconds
Release: 96 seconds
These results cause me to believe the problem has to do with constant buffer resizing (4096 bytes at a time), instead of decoding time. If I disable resizing and copying data to the uncompressed sound buffer, it only takes 904 milliseconds to decode the same file in release builds, and 2076 in debug builds.
I was able to eliminate this delay simply by switching to STL vectors, as they have some implementation-dependent optimizations to help with frequent resizes. When a vector is resized, a memory block that is about 30% bigger than requested is allocated, which eliminates a lot of frequent resizing. You can see the difference by checking size() and capacity() of a vector.