@@ -14,7 +14,9 @@ The audio output's main purpose is to take sound samples from one or several dec
(insert here a schematic of the data flow in aout3)
</para>
<sect2><title> Terminology </title>
</sect1>
<sect1><title> Terminology </title>
<itemizedlist>
<listitem><para><emphasis> Sample </emphasis> : A sample is an elementary piece of audio information, containing the value for all channels. For instance, a stream at 44100 Hz features 44100 samples per second, no matter how many channels are coded, nor the coding type of the coefficients. </para></listitem>
...
...
@@ -28,9 +30,9 @@ The audio output's main purpose is to take sound samples from one or several dec
<listitem><para><emphasis> Downmixing/upmixing </emphasis> : Changing the configuration of the channels (see below). </para></listitem>
</itemizedlist>
</sect2>
</sect1>
<sect2><title> Audio sample formats </title>
<sect1><title> Audio sample formats </title>
<para>
The whole audio output can viewed as a pipeline transforming one audio format to another in successive steps. Consequently, it is essential to understand what an audio sample format is.
...
...
@@ -72,9 +74,9 @@ The audio_sample_format_t structure then contains two additional parameters, whi
These last two fields (which are <emphasis> always </emphasis> meaningful as soon as aout_FormatPrepare() has been called) make it easy to calculate the size of an audio buffer : i_nb_samples * i_bytes_per_frame / i_frame_length.
</para>
</sect2>
</sect1>
<sect2><title> Typical runcourse </title>
<sect1><title> Typical runcourse </title>
<para>
The input spawns a new audio decoder, say for instance an A/52 decoder. The A/52 decoder parses the sync info for format information (eg. it finds 48 kHz, 5.1, 196 kbi/s), and creates a new aout "input stream" with aout_InputNew(). The sample format is :
...
...
@@ -139,28 +141,160 @@ When this initialization is over, the "decoder" plug-in can run its main loop. T
</para>
<para>
The audio mixer then calculates whether it has enough samples to build a new output buffer. If it does, it mixes the input streams, and passes the buffer to the output layer.
The audio mixer then calculates whether it has enough samples to build a new output buffer. If it does, it mixes the input streams, and passes the buffer to the output layer. The buffer goes along the output pipeline (which in our case only contains a converter filter), and then it is put in the output FIFO for the device.
</para>
<para>
Regularly, the output device will fetch the next buffer from the output FIFO, either through a callback of the audio subsystem (Mac OS X' CoreAudio, SDL), or thanks to a dedicated audio output thread (OSS, ALSA...). This mechanism uses aout_OutputNextBuffer(), and gives the estimated playing date of the buffer. If the computed playing date isn't equal to the estimated playing date (with a small tolerance), the output layer changes the date of all buffers in the audio output module, triggering some resampling at the beginning of the input pipeline when the next buffer will come from the decoder. That way, we shall resynchronize audio and video streams. When the buffer is played, it is finally released.
The access to the internal structures must be carefully protected, because contrary to other objects in the VLC framework (input, video output, decoders...), the audio output doesn't have an associated thread. It means that parts of the audio output run in different threads (decoders, audio output IO thread, interface), and we do not control when the functions are called. Thus, much care must be taken to avoid concurrent access on the same part of the audio output, without creating a bottleneck which would cause latency problems at the output layer.
</para>
<para>
Consequently, we have set up a locking mechanism in five parts :
</para>
<orderedlist>
<listitem><para> AOUT_FMT_FLOAT32 : the general case for plain PCM samples. </para></listitem>
<listitem><para> AOUT_FMT_FIXED32 : machines without FPU cannot work on floats, so it is recommended to use fixed-point arithmetic instead. </para></listitem>
<listitem><para> AOUT_FMT_A52 : the raw undecoded samples will be passed to the output layer for decoding by an external S/PDIF dedicated hardware. </para></listitem>
<listitem><para><emphasis> p_input->lock </emphasis> : This lock is taken when a decoder calls aout_BufferPlay(), as long as the buffer is in the input pipeline. The interface thread cannot change the input pipeline without holding this lock. </para></listitem>
<listitem><para><emphasis> p_aout->mixer_lock </emphasis> : This lock is taken when the audio mixer is entered. The decoder thread in which the mixer runs must hold the mutex during the mixing, until the buffer comes out of the output pipeline. Without holding this mutex, the interface thread cannot change the output pipeline, and a decoder cannot add a new input stream. </para></listitem>
<listitem><para><emphasis> p_aout->output_fifo_lock </emphasis> : This lock must be taken to add or remove a packet from the output FIFO, or change its dates. </para></listitem>
<listitem><para><emphasis> p_aout->input_fifos_lock </emphasis> : This lock must be taken to add or remove a packet from one of the input FIFOs, or change its dates. </para></listitem>
</orderedlist>
<para>
Currently, the audio mixer will impose one of these three formats. Technically, nothing prevents you from providing for instance an AOUT_FMT_S16_NE mixer, but for simplicity it has been decided to keep only two PCM formats, and formats for undecoded samples.
Having so many mutexes makes it easy to fall into deadlocks (ie. when a thread has the mixer lock and wants the input fifos lock, and the other has the input fifos lock and wants the mixer lock). We could have worked with fewer locks (and even one global_lock), but for instance when the mixer is running, we do not want to block the audio output IO thread from picking up the next buffer. So for efficiency reasons we want to keep that many locks.
</para>
<para>
So we have set up a strong discipline in taking the locks. If you need several of the locks, you <emphasis> must </emphasis> take them in the order indicated above. For instance if you already the hold input fifos lock, it is <emphasis> strictly forbidden </emphasis> to try and take the mixer lock. You must first release the input fifos lock, then take the mixer lock, and finally take again the input fifos lock.
</para>
<para>
It might seem a big constraint, but the order has been chosen so that in most cases, it is the most natural order to take the locks.
</para>
</sect1>
<sect1><title> Internal structures </title>
<sect2><title> Buffers </title>
<para>
The aout_buffer_t structure is only allocated by the aout core functions, and goes from the decoder to the output device. A new aout buffer is allocated in these circumstances :
</para>
<itemizedlist>
<listitem><para> Whenever the decoder calls aout_BufferNew(). </para></listitem>
<listitem><para> In the input and output pipeline, when an audio filter requests a new output buffer (ie. when b_in_place == 0, see below). </para></listitem>
<listitem><para> In the audio mixer, when a new output buffer is being prepared. </para></listitem>
</itemizedlist>
<note><para>
Most audio filters are able to place the output result in the same buffer as the input data, so most buffers can be reused that way, and we avoid massive allocations. However, some filters require the allocation of an output buffer.
</para><para>
The core functions are smart enough to determine if the buffer is ephemer (for instance if it will only be used between two audio filters, and disposed of immediately therafter), or if it will need to be shared among several threads (as soon as it will need to stay in an input or output FIFO).
</para><para>
In the first case, the aout_buffer_t structure and its associated buffer will be allocated in the thread's stack (via the alloca() system call), whereas in the latter in the process's heap (via malloc()). You, codec or filter developer, don't have to deal with the allocation or deallocation of the buffers.
</para></note>
<para>
The fields you'll probably need to use are : p_buffer (pointer to the raw data), i_nb_bytes (size of the significative portion of the data), i_nb_samples, start_date and end_date.
</para>
</sect2>
<sect2><title> Date management </title>
<para>
On the first impression, you might be tempted to think that to calculate the starting date of a buffer, it might be enough to regularly fetch the PTS i_pts from the input, and then : i_pts += i_nb_past_samples * 1000000 / i_rate. Well, I'm sorry to deceive you, but you'll end up with rounding problems, resulting in a crack every few seconds.
</para>
<para>
Indeed, if you have 1536 samples per buffer (as is often the case for A/52) at 44.1 kHz, it gives : 1536 * 1000000 / 44100 = 34829.9319727891. The decimal part of this figure will drive you mad (note that with 48 kHz samples it is an integral digit, so it will work well in many cases).
</para>
<para>
One solution could have been to work in nanoseconds instead of milliseconds, but you'd only be making the problem 1000 times less frequent. The only exact solution is to add 34829 for every buffer, and keep the remainder of the division somewhere. For every buffer you add the remainders, and when it's greater than 44100, you add 34830 instead of 34829. That way you don't have the rounding error which would occur in the long run (this is called the Bresenham algorithm).
</para>
<para>
The good news is, the audio output core provides a structure (audio_date_t) and functions to deal with it :
</para>
<itemizedlist>
<listitem><para><emphasis> aout_DateInit( audio_date_t * p_date, u32 i_divider ) </emphasis> : Initialize the Bresenham algorithm with the divider i_divider. Usually, i_divider will be the rate of the stream. </para></listitem>
<listitem><para><emphasis> aout_DateSet( audio_date_t * p_date, mtime_t new_date ) </emphasis> : Initialize the date, and set the remainder to 0. You will usually need this whenever you get a new PTS from the input. </para></listitem>
<listitem><para><emphasis> aout_DateMove( audio_date_t * p_date, mtime_t difference ) </emphasis> : Add or subtract microseconds from the stored date (used by the aout core when the output layer reports a lipsync problem). </para></listitem>
<listitem><para><emphasis> aout_DateGet( audio_date_t * p_date ) </emphasis> : Return the current stored date. </para></listitem>
<listitem><para><emphasis> aout_DateIncrement( audio_date_t * p_date, u32 i_nb_samples ) </emphasis> : Add i_nb_samples * 1000000 to the stored date, taking into account rounding errors, and return the result. </para></listitem>
</itemizedlist>
</sect2>
<sect2><title> FIFOs </title>
<para>
FIFOs are used at two places in the audio output : at the end of the input pipeline, before entering the audio mixer, to store the buffers which haven't been mixed yet ; and at the end of the output pipeline, to queue the buffers for the output device.
</para>
<para>
FIFOs store a chained list of buffers. They also keep the ending date of the last buffer, and whenever you pass a new buffer, they will enforce the time continuity of the stream by changing its start_date and end_date to match the FIFO's end_date (in case of stream discontinuity, the aout core will have to reset the date). The aout core provides functions to access the FIFO. Please understand than none of these functions use mutexes to protect exclusive access, so you must deal with race conditions yourself if you want to use them directly !
</para>
<itemizedlist>
<listitem><para><emphasis> aout_FifoInit( aout_instance_t * p_aout, aout_fifo_t * p_fifo, u32 i_rate ) </emphasis> : Initialize the FIFO pointers, and the aout_date_t with the appropriate rate of the stream (see above for an explanation of aout dates). </para></listitem>
<listitem><para><emphasis> aout_FifoPush( aout_instance_t * p_aout, aout_fifo_t * p_fifo, aout_buffer_t * p_buffer ) </emphasis> : Add p_buffer at the end of the chained list, update its start_date and end_date according to the FIFO's end_date, and update the internal end_date. </para></listitem>
<listitem><para><emphasis> aout_FifoSet( aout_instance_t * p_aout, aout_fifo_t * p_fifo, mtime_t date ) </emphasis> : Trash all buffers, and set a new end_date. Used when a stream discontinuity has been detected. </para></listitem>
<listitem><para><emphasis> aout_FifoMoveDates( aout_instance_t * p_aout, aout_fifo_t * p_fifo, mtime_t difference ) </emphasis> : Add or subtract microseconds from end_date and from start_date and end_date of all buffers in the FIFO. The aout core will use this function to force resampling, after lipsync issues. </para></listitem>
<listitem><para><emphasis> aout_FifoNextStart( aout_instance_t * p_aout, aout_fifo_t * p_fifo ) </emphasis> : Return the start_date which will be given to the next buffer passed to aout_FifoPush(). </para></listitem>
<listitem><para><emphasis> aout_FifoPop( aout_instance_t * p_aout, aout_fifo_t * p_fifo ) </emphasis> : Return the first buffer of the FIFO, and remove it from the chained list. </para></listitem>
<listitem><para><emphasis> aout_FifoDestroy( aout_instance_t * p_aout, aout_fifo_t * p_fifo ) </emphasis> : Free all buffers in the FIFO. </para></listitem>
</itemizedlist>
</sect2>
</sect1>
<!--
<sect1><title> API for the decoders </title>
<para>
The API between the audio output and the decoders is quite simple. As soon as the decoder has the required information to fill in an audio_sample_format_t, it can call : p_dec->p_aout_input = aout_InputNew( p_dec->p_fifo, &p_dec->p_aout, &p_dec->output_format ).
</para>
<para>
In the next operations, the decoder will need both p_aout and p_aout_input. To retrieve a buffer, it calls : p_buffer = aout_BufferNew( p_dec->p_aout, p_dec->p_aout_input, i_nb_frames ).
</para>
<para>
The decoder must at least fill in start_date (using an audio_date_t is recommended), and then it can play the buffer : aout_BufferPlay( p_dec->p_aout, p_dec->p_aout_input, p_buffer ). In case of error, the buffer can be deleted (without being played) with aout_BufferDelete( p_dec->p_aout, p_dec->p_aout_input, p_buffer ).
</para>
<para>
When the decoder dies, or the sample format changes, the input stream must be destroyed with : aout_InputDelete( p_dec->p_aout, p_dec->p_aout_input ).
</para>
</sect1>
<!--
<sect1> <title> API for the output module </title>