US5388181A - Digital audio compression system - Google Patents

Digital audio compression system Download PDF

Info

Publication number
US5388181A
US5388181A US08/128,322 US12832293A US5388181A US 5388181 A US5388181 A US 5388181A US 12832293 A US12832293 A US 12832293A US 5388181 A US5388181 A US 5388181A
Authority
US
United States
Prior art keywords
blocks
signal
phase
magnitude
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US08/128,322
Inventor
David J. Anderson
Donghoon Lee
David L. Neuhoff
Omar A. Nemri
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Michigan
Original Assignee
Anderson; David J.
Lee; Donghoon
Neuhoff; David L.
Nemri; Omar A.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anderson; David J., Lee; Donghoon, Neuhoff; David L., Nemri; Omar A. filed Critical Anderson; David J.
Priority to US08/128,322 priority Critical patent/US5388181A/en
Application granted granted Critical
Publication of US5388181A publication Critical patent/US5388181A/en
Assigned to MICHIGAN, UNIVERSITY OF, REGENTS OF THE, THE reassignment MICHIGAN, UNIVERSITY OF, REGENTS OF THE, THE ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: ANDERSON, DAVID J., NEUHOFF, DAVID L.
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition

Definitions

  • the present invention relates generally to digital audio systems and more particularly to a data compression system for substantially increasing the playing time of a given storage medium without significant degradation of sound quality.
  • the standard compact disc player transfers digital audio data at a rate of approximately 5.3 megabytes per minute for one stereo channel. If one were to store all compact disc data for a 3 minute stereo selection on a hard disk of a computer, the selection would occupy 31.8 megabytes, or more than a 30 megabyte hard drive can hold. Even using a large 750 megabyte disk drive only about 1 hour and 10 minutes of music could be stored. That is far too little for an evening's entertainment or for digital jukebox purposes.
  • the present invention uses a combination of source coding theory and theory of human auditory perception to greatly reduce the storage requirements for digital audio.
  • the 5.3 megabyte per minute compact disc data rate has been reduced to 0.42 megabytes per minute per channel.
  • This reduction in data rate is achieved by an encoding and decoding system in which the more costly components are used on the encoding side, to allow simple and inexpensive equipment to be used on the decoding side.
  • the system is designed to permit decoding with a processor with limited arithmetic precision, for example, 16 bit fixed-point arithmetic.
  • the present invention is thus well suited for music distribution systems, video game systems, consumer audio systems, digital jukeboxes and computer-controlled video/audio systems.
  • a wideband digital audio signal is processed by transforming it into the frequency domain comprising data capable of being represented as complex numbers.
  • a magnitude portion and a phase portion are extracted from the frequency domain data, with different quantization processes being performed on the magnitude and phase portions.
  • the magnitude and phase data are stored as digital data on a data storage medium.
  • the magnitude portion is quantized using a vector quantization technique while the phase portion is quantized using uniform scalar quantization.
  • expansion via scaling of bands of magnitude coefficients to a common power level assures that the noise produced by their quantization will be essentially inaudible.
  • this technique it is possible to achieve effects similar to a more complex process of dynamically choosing the rate of the quantizer (in bits per coefficient) on the basis of perceptual masking calculations.
  • different vector quantizers are designed for different bands of magnitude coefficients. This use of a plurality of vector quantizers insures better performance, because the quantizer is matched to the band being encoded, e.g., more highly correlated vectors in the low frequency bands than in the high frequency bands. Moreover, the vector quantizers may have different rates (in bits per magnitude coefficient) reflecting the fact that the human auditory system is more sensitive to errors in some frequency bands than others.
  • a vector quantization codebook is developed uniquely for the magnitudes of each segment (of approximately 3 minutes in length) of the audio selection being recorded.
  • the unique codebook further includes portions which are unique to each of the frequency bands.
  • the codebook is first transmitted to and loaded in the decoding equipment, whereupon the magnitude portions of the encoded digital audio may be quickly and efficiently decoded to restore the original magnitude portions.
  • the codebooks are two-stage and tree-structured so that excellent quantization characteristics are obtained with greatly reduced complexity.
  • Neural conduction time in the human auditory system is somewhat indeterminate and therefore the phase of higher frequencies is of less importance than the phase of lower frequencies. This means that while the phase at low frequencies must be quantized with a large number of bits, the phase at higher frequencies may be quantized with substantially fewer bits.
  • the presently preferred embodiment uses a detailed understanding of human auditory perception to allocate the minimum number of bits to the quantization of each phase, with higher frequencies receiving less or even zero bits. Moreover, it uses pseudorandom phase dither to eliminate the audible effects of correlations in the quantized phase errors.
  • Transform coding such as described in this invention, is susceptible to "pre-echo," which may be heard when intervals of silence are followed by a transient such as a drumbeat, unless corrective measures are taken.
  • pre-echo is greatly reduced or eliminated by dividing blocks into subblocks, detecting the occurrence of transients that would cause pre-echoes and individually expanding via scaling the subblocks of those blocks containing transients, in a manner that exploits temporal masking in human auditory perception.
  • pre-echo is reduced or eliminated by dynamically augmenting the bit allocation to the phase quantizers in blocks containing transients.
  • the entire system is designed with low cost decoding in mind. Specifically, several steps of the encoding process are tailored to minimize the potential for truncation errors when a low cost, limited precision fixed-point arithmetic is used in the decoder.
  • One such principle is the expansion via scaling of each block before transforming.
  • FIG. 1 is an overall block diagram of the digital audio encoding system of the invention, showing both encoding and decoding stations;
  • FIG. 2 is a block diagram of the presently preferred encoding system
  • FIG. 3 is a block diagram of a two-stage, tree-structured vector quantizer
  • FIGS. 4a and 4b are flow diagrams of the encoding process.
  • FIG. 5 is a schematic diagram of the presently preferred decoding system.
  • FIG. 1 An example of a distribution system is illustrated.
  • a source of a standard audio or digital audio is fed at 10 to the encoding computer 12 in which the encoding processes of the invention are performed.
  • the encoding computer can be one of a variety of commercially available engineering work stations or high performance microcomputers.
  • the encoding process is performed by software run on the encoding computer, although hardware implementations may be employed in high volume manufacture and distribution applications.
  • the encoding process proceeds in slower than real time, the encoded output comprising a highly compressed manifestation of the original digital audio input.
  • the encoding process may someday be performed nearly in real time, making broadcast applications more convenient.
  • the encoded digital audio is stored on an archival storage medium such as a hard disk system, CD-ROM, digital tape, or the like.
  • the data stored on archival storage 14 are supplied through a distribution system 16 to a plurality of local storage media 18, which may be hard disk drives associated with individual jukebox players located at distributed locations, e.g., throughout the country.
  • Virtually any digital data distribution system can be used for this purpose.
  • the individual jukeboxes can receive the latest selections by ground data link (telephone) or satellite link, via modem or other suitable equipment. Distribution can also be effected using floppy diskette, digital tape or other removable data storage media.
  • Computer network systems can also be employed.
  • each local storage medium Associated with each local storage medium is a decoding processor 20, which transforms the locally stored compressed data into a high quality rendition of the original digital audio signal, which is then converted into analog form by a digital-to-analog convertor, amplified and played through the speaker system (not shown).
  • a decoding processor 20 which transforms the locally stored compressed data into a high quality rendition of the original digital audio signal, which is then converted into analog form by a digital-to-analog convertor, amplified and played through the speaker system (not shown).
  • the encoding and decoding processes are complementary.
  • the original digital audio signal is highly compressed during encoding, to save storage space and reduce the data transfer rate.
  • This highly compressed data is decompressed upon decoding, to restore a high quality audio output.
  • considerable emphasis is placed on retaining high musical quality by masking signal degradation either under loud sounds or next to loud sounds in the time or frequency domains.
  • An advantage of the data compression achieved is that a useful number of selections can be locally stored on a fast access hard disk, making it possible to find and play a musical selection without the time delay associated with mechanical disk changers.
  • the presently preferred encoder is implemented in software, although certain portions or all of the encoder might be implemented in hardware, if desired.
  • the audio source material is low pass filtered through a 15 kilohertz low pass filter 24 and converted into a digital audio signal by a 16 bit analog-to-digital convertor 26, sampling at 32 kilohertz (just above the Nyquist rate) and producing 32,000 digital audio samples per second. Therefore, the audio samples are 16 bit integers ranging from -32,768 to +32,767.
  • an analog audio source is assumed. However, if desired, the digital data from a digital audio source could be used instead, in which case the filter and analog-to-digital convertor would not be needed, but if the digital audio is not sampled at 32 kilohertz, it must be converted to such using standard techniques.
  • the digital audio signal is first divided into segments of approximately 3 minutes. Each segment is separately encoded and decoded as described in the remainder of this document.
  • the typical segment length (3 minutes) is a compromise. Shorter segments lead to higher quality audio reproductions at the expense of producing more bits.
  • each block consists of 1,024 audio samples (32 milliseconds) of which 64 are from the previous block.
  • the blocks are then processed to minimize certain effects of transform coding such as pre-echo, post-echo, quantization errors and decoder truncation errors.
  • the next preprocessing step is designed to reduce or eliminate pre-echo, and endemic problem in transform coding.
  • the quantization process in transform coding introduces errors that are inverse transformed at the decoder and spread evenly throughout the block.
  • the power in the errors introduced in this way tends to be proportional to the average power of the audio signal in the block.
  • the result is that if a block contains a drumbeat or other transient, the beginning of the block will consist of very small values and the end will consist of very large values. Overall, the average power of the block will be fairly large, and thus the errors introduced by the quantization process will be large relative to the initial quiet portion of the block.
  • the reproduction of the block contains an initial splash of quantization errors that is audible as a "pre-echo" of (i.e., before) the drumbeat or other transient that follows.
  • transients are detected in blocks and two processing steps are applied to blocks containing transients.
  • the first to be described here, is dynamic time domain expansion of subblocks.
  • the second to be described later in the section on phase quantization, is to allocate more bits to the quantization of phase in those blocks where transients are detected.
  • each block is subdivided into 8 subblocks of 128 samples each.
  • the average power, or simply power, in each subblock is then calculated. (The average power in a block is the sum of the square of the components of the block divided by the number of components in the block. ) If the ratio of the average power in a subblock to the average power in the previous subblock is greater than a certain threshold, a flag is set to indicate that a transient has been detected in the block and dynamic time domain expansion (to be described shortly) is undertaken.
  • the block is high-pass filtered before transient detection. If x 2n-1 , x 2n denotes the nth pair of samples in a block, then the high pass filter in the presently preferred embodiment produces
  • a transient is detected only if the RMS power in the subblock (i.e., the square root of the average power) is greater than some specified threshold, presently set at 50 (on the scale of -32,768 to +32,767).
  • the (unfiltered) subblocks occurring before the transient are expanded by scaling with various expansion factors.
  • Subblocks occurring well before the transient are scaled by larger factors than those occurring just before the transient.
  • the subblock containing the transient and subsequent subblocks are not scaled.
  • the transition from large expansion factors to smaller occurs gradually or in a stepwise diminishing fashion.
  • the reason for the diminishing set of expansion factors is the backward masking phenomenon in human auditory perception, wherein errors occurring just before a transient are less noticeable than those occurring well before the transient.
  • the small expansion factor applied to the subblocks just before the transient permits more quantization noise (introduced in subsequent steps) than the larger expansion factor applied to segments well before the transient.
  • nth subblock 1 ⁇ n ⁇ 8
  • R the ratio of the power in the nth filtered subblock to that of the previous filtered subblock is sufficiently large and its RMS power is sufficiently large
  • the ratio R of the power in the last 9-n subblocks (unfiltered) to that of the first n-1 subblocks (unfiltered) is computed.
  • the expansion factors g1, . . . ,gn-1 for the first n-1 subblocks are as follows:
  • the transient flag is still set and used to increase the bit allocation for phase quantization.
  • the expansion factors are between 1 and 2.
  • the high pass filter is used only to detect transients and is not part of the final processing of the data; i.e., it is the unfiltered data that is expanded.
  • R is rounded to a number whose inverse can be exactly used by the decoder as a multiplication factor.
  • the factor R, the transient flag and the index n are all recorded as part of the encoded digital data. In the presently preferred embodiment 12 bits are used to describe these. In segments encoded to date, no more than 1/6 of the blocks contained transients, meaning that at most 0.002 bits per sample were required for these parameters.
  • the same approach can be used to reduce post-echo.
  • scaling of subblocks after the transient would take place with those subblocks farthest from the transient being scaled up more than those closer to the transient.
  • the dynamic time domain expansion described here is useful for any block quantization process (e.g., transform coding) that would otherwise introduce pre-echo.
  • the next preprocessing step is time domain expansion on a block-by-block basis, wherein each block is scaled so its maximum equals a specified value.
  • the specified maximum value is chosen to be the largest value that the decoder can represent or 70.7% of this value if two stereo channels are to be simultaneously decoded in the manner described later.
  • the purpose of this time domain expansion is to amplify quiet blocks so that truncation errors due to decoding, especially due to inverse Fourier transforming, with limited precision fixed-point arithmetic will be small in comparison. It also reduces, in quiet blocks, the effects of quantization errors introduced in subsequent steps.
  • the expansion factor for each block is recorded as part of the encoded digital data.
  • the expansion factors are limited to being powers of 2 so that the inverse expansion at the decoder can be done rapidly and exactly with fixed-point arithmetic.
  • the result is that in the presently preferred embodiment 0.004 bits/sample are used for this purpose.
  • This time domain expansion is useful for any digital audio encoding process that divides the audio signal into blocks.
  • the window is a square root Hanning window in the 64 sample overlap area at the beginning of the block and also at the end of the block; i.e., the window allows 1/16th of its size for overlap add.
  • the Hanning window gradually diminishes the amplitude of one block in the overlap while increasing the amplitude of the next block, blending the two together to avoid transients due to different coding parameters on either side of the overlap.
  • a square root Hanning window is also applied at the decoder as well. To eliminate the possibility of truncation errors when decoding with fixed-point arithmetic, in the presently preferred embodiment, the window values are rounded to values exactly representable by fixed-point arithmetic.
  • FFT Fast Fourier Transform
  • the FFT converts these data blocks from the time domain to the frequency domain, with the ith complex number representing the frequency component at frequency f S (i-1)/N, 1 ⁇ i ⁇ N, where f S denotes the sampling frequency (32 kilo-hertz in the presently preferred embodiment).
  • Any suitable computer implemented Fast Fourier Transform algorithm can be used for this purpose.
  • Each complex number in the transformed block is then converted into 2 real numbers, a magnitude and phase.
  • the magnitude and phase blocks are handled by different processes. Accordingly, processing of the magnitude portion proceeds along branch 34 and the phase portion along branch 36.
  • the DC and Nyquist points are also separated and quantized along branch 38 using a 16 bit uniform scalar quantizer
  • the magnitude block is divided into subblocks called bands and each band is expanded via scaling.
  • This process which is referred to as band-by-band expansion (BBE) is depicted at BBE block 42. This is done for all blocks of the entire segment.
  • BBE band-by-band expansion
  • a vector quantization codebook For each band, a vector quantization codebook is designed. Such codebooks may be designed to work for any segment of audio, or as in the preferred embodiment described here, they may be designed for the specific audio segment being encoded. In this case, the codebook for a given band is designed by training on the sequence formed by concatenating the elements of that band from each magnitude block of the digital audio segment being encoded (approximately 3 minutes in the preferred embodiment), and the codebooks themselves are recorded as part of the encoded digital data.
  • the vector quantizer design block is pictured at 43.
  • each band of a given magnitude block is further divided into vectors (with vector lengths equal to the dimension of the codebook for that band), which are then assigned an index from the codebook for that band.
  • the vector quantizer block is depicted at 44. The index for each vector of each band and block are recorded as part of the encoded digital data.
  • Each term in a phase block is quantized with a uniform scalar quantizer 46.
  • the indices produced by this quantization process are recorded as part of the encoded digital data.
  • the step size of the quantizer depends on the frequency corresponding to the given phase term, as described later.
  • pseudorandom phase dither is added to each phase, as described later.
  • the bits produced by the phase quantizers are included as part of the encoded digital data.
  • Band-by-band expansion is used to reduce the audibility of the errors introduced by vector quantization through exploitation of the spectral masking properties of the human ear.
  • the background is the following.
  • a typical quantization process for example, the vector quantization process described later
  • the ear is less sensitive to errors in magnitudes corresponding to frequencies in the neighborhood of which the magnitudes are generally large. Indeed, at any frequency there is an auditory "critical band” surrounding that frequency that errors at this frequency are inaudible if the error power within the critical band is a sufficiently small fraction of the power in the magnitudes within the critical band.
  • the width of these critical bands has been found to be proportional to the logarithm of the center frequency. This critical band theory indicates that, unless corrective measures are taken, the quantization noise will be audible in bands where the magnitudes are small.
  • Band-by-band expansion is a very simple and effective way to take corrective measures, thereby exploiting the spectral masking effect.
  • each magnitude block is divided into subblocks, called bands, of adjacent terms; each band is then expanded via scaling to some common power before quantizing. The result is that the quantization errors are equally inaudible in all bands. More specifically, each band is expanded as much as possible, with the constraints that all bands have the same power after expansion and that no expanded magnitude term exceed the maximum value representable at the decoder.
  • the expansion factors are rounded to numbers whose inverse can be exactly used by the decoder as multiplication factors. The expansion factors are recorded as part of the digital data.
  • band widths it is important to choose the expansion band widths appropriately. Generally narrower bands give a better spectral masking effect but more bits are required to describe the expansion factors. However, the critical band theory indicates that there is no need to make the bands narrower than the critical bands. Indeed it was found that 12 bands each roughly twice the width of the critical band at its frequency lead to the most efficient implementation. See Table I for the preferred sets of bands. For example, band 9 contains the 40 magnitudes from 100th to the 139th which correspond to frequencies from 3,125 hz to 4,343.75 hz. The encoding of the expansion factors for these bands requires 0.05 bits/sample.
  • the band-by-band expansion just described is useful not only when processing the magnitudes from a Fourier transform but also when processing the transform coefficients produced by any transform whose coefficients are interpretable as corresponding to frequencies, for example, the discrete cosine transformation.
  • the expanded data is vector quantized using a distinct codebook for each band, and preferably each codebook is uniquely designed for the audio segment. Codebook design is discussed more fully later.
  • vector quantization performance is enhanced, because the statistical characteristics of the magnitude vectors within a given band are quite similar, but may be quite different than the characteristics of other bands. For example, lower frequency bands are more correlated than higher frequency bands.
  • Another advantage of designing a separate quantizer for each band is that they may have different encoding rates (in bits/coefficient) to reflect perceptual criteria for preferentially doing finer quantization of certain bands than others, thereby reducing the total rate while maintaining high quality.
  • the bands adopted for vector quantization need not be the same as the bands adopted for band-by-band expansion.
  • the decoder becomes more complex, and if the codebooks are individually tailored to the segment, more bits are needed to describe the codebooks. Since the statistics of the vectors within a given band of the band-by-band expansion were found to be similar, in the preferred embodiment, these bands are used for the vector quantization as well.
  • TSVQ tree-structured vector quantizer
  • each TSVQ (in bits per magnitude coefficient) is controlled by varying the "depth" of the tree. Specifically, the rate of a TSVQ is the depth divided by the block length. Rate selection for the quantizers will be explained later.
  • TSVQ tree-structured vector quantization
  • the codebook dimension (i.e., the vector length) is fixed at 4, with the exception of the first band having a dimension of 3 since there are only 3 coefficients.
  • codebook dimension is selected small enough so that the codebooks will be of acceptable size but large enough so that vector quantization is efficient. From simulation studies it was learned that codebook dimension 4 provided acceptable quantizer performance. Furthermore, at rate 2 the codebook size is 2K bytes, which is reasonable.
  • each vector quantizer i.e., the tree depth for each TSVQ
  • the rate of each vector quantizer was selected for the presently preferred embodiment based on simulation studies under the broad guideline that at higher frequencies, lower rates can be used. That is to say, at higher frequencies, more noise can be tolerated. This is based on psychoacoustic findings. Table II sets forth the presently preferred tree depths. Additionally, since the audio signal is filtered to 15 kilohertz, the last 32 magnitudes are negligible and are not quantized at all. (The decoder assumes them to be zero). With these rate allocations, the code word indices require a total of 0.96 bits/sample and the description of the codebooks requires 0.03 bits/sample, assuming a 3 minute segment.
  • Table II The values shown in Table II are a set that yields acceptable performance. However, these values may be changed, if desired, to suit particular audio requirements. Moreover, after auditioning the encoded and decoded segment it is possible to change the parameters to achieve certain rate or quality goals. As shown in Table II, two-stage TSVQ is employed. The primary reason is to reduce the size of codebooks.
  • FIG. 3 shows a block diagram of a two-stage TSVQ.
  • a secondary advantage of the band-by-band expansion is that it makes the magnitude vectors in a given band more homogeneous, thus making it possible for the codebook design algorithm, described later, to produce a better codebook for the given band.
  • the above-described encoding process which employs two-stage, tree-structured vector quantization, is based on codebook which is uniquely designed for each segment.
  • This codebook comprises a set of codebooks, one uniquely designed for each of the 12 subbands.
  • This approach is presently preferred, since it gives excellent performance and since the codebooks can be downloaded to the playback equipment relatively quickly, prior to the music playback.
  • a universal codebook could be developed (i.e., one for a segment, or even one for all segments), although it is anticipated that such a codebook would be larger than a segment-specific codebook and this might require greater memory capacity in the decoding and playback equipment. It is also possible to use a universal codebook with a mask unique to a particular segment to allow only the necessary portions of the universal codebook to be uploaded prior to music playback.
  • the presently preferred codebook design process is performed using the data (or representative sampling of the data) of the band and segment for which the codebook is being developed.
  • the codebook development process is implemented by employing the techniques generally described in IEEE Transaction on Information Theory, "Speech Coding Based Upon Vector Quantization, " by A. Buzo et al., Vol. 28, pp 562-574, October 1980. See also IEEE ASSP Magazine, “Vector Quantization, " by R. M. Gray, Vol 1, pp 4-29, April 1984. We have significantly enhanced the algorithms described in the foregoing references which will be described below.
  • the enhancements greatly speed up the codebook development process. Before discussing these enhancements, a brief description of the general principals of tree-structured vector quantization (TSVQ) will be given.
  • TSVQ tree-structured vector quantization
  • TSVQ like other forms of vector quantization, is a method of partitioning the input vector space into N, a power of 2, disjoint subsets and representing each subset with a code vector and an index.
  • the partitioning works in a hierarchial way following a certain binary tree structure.
  • a binary tree of depth d is a data structure consisting of one "root node,” and 2 d -2 "internal nodes” and 2 d “terminal nodes.”
  • the root node and each internal node have two other nodes (either internal or terminal) as its "children.”
  • the children of the root node are said to be at depth 1, the children of these are at depth 2, and so on. Consequently, there are 2 e nodes at depth e.
  • the 2 d children of the nodes at depth d-1 are the terminal nodes.
  • a TSVQ of dimension k and depth d there is a binary tree of depth d, with one k-dimensional testvector associated with the root node and each internal node, and with a codevector associated with each terminal node.
  • Designing a TSVQ consists of choosing the testvectors and codevectors.
  • the set of codevectors constitutes the codebook of the TSVQ.
  • the Euclidean distance is computed from x to each of the testvectors associated with the two children of the root node. Then the children at depth 2 of which ever child at depth 1 give smaller distance are each compared to the data vector x to find the closest.
  • a testvector at depth d-1 is found at each depth. Then x is compared to the two codevectors that are children of this testvector and the closest of these becomes the codevector that represents the data vector. Its index among the set of codevectors becomes part of the encoded digital data. The decoder then uses the index to reproduce the codevector as a rendition of the input data vector.
  • TSVQ partitioning method requires substantially fewer arithmetic operations than direct searching of the codebook for the closest codevector.
  • the TSVQ method does not necessarily find the closest codevector in the codebook, it has been found that it generally finds a good one. Note that the testvectors are needed for encoding only, whereas the codevectors are needed for both encoding and decoding, and accordingly, in the preferred embodiment the codevectors (codebook) are encoded as part of the digital data.
  • TSVQ Designing a TSVQ consists of selecting the testvectors and codevectors. As with any vector quantization process, the success of TSVQ depends on the quality of this design. For the invention described here, a very important advantage of TSVQ is that good TSVQ codebooks can be designed much more rapidly than otherwise unstructured codebooks. (This is important because designing the codebooks is part of the process of encoding a segment.) The design procedure and our enhancements will now be described.
  • the design is based on a training set, consisting of a sequence of k-dimensional training vectors from the data to be encoded. (As mentioned earlier, in the preferred embodiment, one codebook is designed for each band of each segment, so the training set is formed by concatenating the elements of that band from each magnitude block of the digital audio segment being encoded.)
  • the TSVQ algorithm designs the testvectors one depth at a time, starting at the root node.
  • Next to find the child testvectors of c (at depth 1) one begins by selecting an initial candidate c1 and c2, for each. Usually these are c times (1- ⁇ ) and c times (1- ⁇ ) where ⁇ is a small number, say 0.0005.
  • the training set is partitioned into two subsets, according to which of the candidate testvectors is closest.
  • the candidate testvectors are then replaced by the centroids of these subsets, forming new candidate testvectors c' 1 and c' 2 .
  • c 1 and c 2 are replaced respectively, by ##EQU2## where m 1 is the number of testvectors closer to c 1 than c 2 and m 2 is the number of testvectors closer to c 2 than c 1 . Note that m 1 plus m 2 is m.
  • the partitioning and replacement by centroids is repeated over and over, until no significant improvement is found. Improvement is measured by comparing the distortion that results from using the pair of old testvectors as representatives of the training set to the distorting resulting from using the new pair. Specifically, the distortion from using a pair c 1 , c 2 is ##EQU3## where ⁇ x-c ⁇ denotes the Euclidean distance between the two vectors x and c.
  • determining whether a testvector x i is closer to c 1 or c 2 is most quickly done by taking the dot product of x i with (c 1 -c 2 ) and comparing it to the threshold ( ⁇ c 1 ⁇ 2 - ⁇ c 2 ⁇ 2 )/2.
  • the training set is split in half, with those training vectors that are closest to c 1 forming a training set associated with it and the rest forming a training set associated with c 2 .
  • the above iterative binary splitting procedure is applied, obtaining the two children for c 1 and then the two children for c 2 , with each child getting a subset of the original training vectors. This results in the four testvectors at depth 2. Applying the above iterative binary splitting procedure to these yields the testvectors at depth 3, and so on, until applying the iterative binary splitting to the training sets associated with the testvectors at depth d-1 yields the 2 d codevectors.
  • the TSVQ design procedure consists of performing the iterative binary splitting at each internal node. At any given node, in each iteration, a pair of centroids is calculated using Equations 1 and 2 and the resulting new distortion is calculated, each using Equation 3. These calculations occupy the major part of the TSVQ design program execution time. With our enhancements, the execution time for these calculations is reduced at least by 75%. These savings come by recognizing the c' 2 can be computed from c' 1 and the "parent" testvector c, whose children we seek, via ##EQU4## where m is the number of training vectors associated with the parent node c.
  • D 2 may be computed from the first term, denoted D 1 , and the previously computed distortion associated with the parent node, denoted D, via
  • Another enhancement of our program involves sorting the training set.
  • the program works by designing two testvectors at a time, working with the training set associated with their parent.
  • This parent training set is a subset of the original large training set. And except for parents at very small depths, it is a small subset of the original training set, whose members are scattered widely through the original training set. Since the entire training set is too large to reside in fast computer memory, the bulk of it generally resides in a disk. When a training vector is needed for the above-mentioned calculations, it is not likely to be in fast memory. So the computer operating system brings it in from the disk.
  • the operating systems also bring in a block of neighboring training vectors, the needed ones are scattered so widely, that no many are likely to be in the retrieved block. Therefore the operating system must make frequent, time-consuming disk accesses when executing just one iteration of the above process.
  • the training vectors associated with their parent are sorted so the training vectors associated with each child are stored contiguously.
  • the operating system also retrieves a block of other needed training vectors. This greatly reduces the amount of time spent on disk accesses.
  • phase branch 36 (FIG. 2) a description of the phase processing will be given.
  • phase is very important in the low frequency range, but decreases in importance to where it becomes virtually unimportant at sufficiently high frequencies.
  • threshold can be converted into a tolerance on error for each phase term in the block.
  • the presently preferred embodiment uses uniform scalar quantization of the phase. Since adjacent phase values are uniformly distributed between 0 and 2 ⁇ and since adjacent phases are uncorrelated, vector quantization would perform no better, but would be much more complex.
  • the step size of the uniform quantizer should vary with frequency. Specifically, the step size of the quantizer for the ith phase should be no larger than ⁇ i as given in Equation 4.
  • the number of levels should be at least 2 ⁇ / ⁇ i and the number of bits allocated to the quantizer should be the logarithm base 2 of the number of levels, rounded up to an integer. That is the number of bits n i to be allocated to the ith phase is
  • ceil(x) is the function returning the smallest integer no less than x.
  • a uniform scalar quantizer allocated n bits has 2 n levels, spaced apart by 2 ⁇ /2 n . Based on this principle, the bit allocations used in the preferred embodiment are shown in Table III. Note that to simplify the decoder, the bit allocations are the same for all phases in a band. Note also that the phases in band 12 are allocated 0 bits. This means that the decoder assumes they are zero, minus the pseudorandom phase dither described below. With these bit allocations, the indices for the uniform scalar phase quantizers require a total of 0.66 bits/sample.
  • the quantization errors introduced by the various quantizers will be highly correlated and such correlated errors will be distinctly audible.
  • the audibility of said errors is due to the fact that the effect of the inverse transform operating on the quantized magnitudes and phases is to produce sinusoids at the various frequencies with related phases. However, these phase relationship are abruptly interrupted at block boundaries, and this can be heard as a crackling noise. The audible effects of this phenomenon can be eliminated by pseudorandomly staggering the first levels of the uniform scalar quantizers.
  • each uniform quantizer is designed in some arbitrary way, and the necessary staggering is assured by adding pseudorandom dither before quantizing, and by subtracting the pseudorandom dither when decoding.
  • a pseudorandom sequence of phases ⁇ 1 , ⁇ 2 , . . . , ⁇ 512 is stored at both the encoder and the decoder.
  • the term ⁇ i is added to it, the sum is quantized and an index is produced.
  • ⁇ i is subtracted from the phase value corresponding to the index.
  • a greater number of bits are allocated to the phase quantizers in blocks containing transients such as drumbeats.
  • the purpose of this augmented bit allocation is to reduce the pre-echo.
  • Increasing the bit allocations of the phase quantizers in a block containing a transient reduces the overall quantization error, thereby reducing the errors that may be spread to the quiet initial segment. If allowed to happen, such errors would be heard as pre-echo. Since it is the high frequency phases that are given the smallest bit allocations (see Table III), it is these phases that are the source of the most quantization errors, and it is these phases whose quantizers need to have their rates augmented.
  • the method for detecting frames was described earlier, and as stated there, the effect of said method is to set a flag indicating whether the frame contains a transient. If so, the bit allocations for the phase quantizers are increased.
  • the augmented phase allocations to be used in this case are shown in Table III. For example, in band 10 the bit allocations for the phases are increased from 1 to 2 bits; i.e., they have 4 rather than 2 levels. Among all segments encoded to date, no more than 1/6 of the blocks all required augmented bit allocations. Correspondingly, no more than 0.028 bits per sample has been needed for the augmentation.
  • FIGS. 4a and 4b the encoding process is summarized. As previously explained, the magnitude and phase portions are treated differently. Accordingly, the flow diagram of FIG. 4a shows a magnitude path A and a phase path B which correspond to the magnitude branch A and phase branch B of FIG. 4b.
  • the encoded digital data includes bits representing the dynamic time expansion parameters (at most 0.002 bits/sample), the time domain expansion factors (0.004 bits/sample), the DC magnitude coefficient (0.017 bits/sample), the band-by-band expansion factors (0.05 bits/sample), the magnitude vector quantization codebooks (0.031 bits/sample), magnitude vector quantizer indices (0.96 bits/sample), the baseline phase quantizer indices (0.66 bits/sample), and the augmented bits for the phase quantizers (at most 0.028 bits/sample) for a total of at most 1.75 bits/sample. With the 32,000 samples/sec sampling rate, this induces 56,000 bits/second, or 0.42 megabytes/minute
  • FIG. 5 depicts the presently preferred decoding circuit, which simultaneously decodes left and right channels of a segment for which both channels have been encoded using the method previously described.
  • This decoding circuit will now be described.
  • the decoding circuit comprises random access program and data memory 60, a digital signal processor chip 62 which may be one of several available.
  • the digital signal processor is coupled through a block of dual-ported memory 64 to the computer bus 66 of a computer system which controls the local storage medium on which the compressed data has been stored.
  • a first in, first out (FIFO) memory block 68 communicates with signal processor 62 to transfer the decoded, decompressed data to the digital-to-analog convertor (DAC) 70.
  • the digital-to-analog convertor as well as the FIFO memory 68 are synchronized by clock 72.
  • the digital-to-analog convertor provides stereo outputs L and R which are applied to a buffer and filter circuit 74.
  • the output of circuit 74 feeds the conventional analog audio system 76.
  • the magnitude vector quantization codebooks for the individual subbands are moved from the hard disk of the main computer via the computer bus 66 and dual-ported memory 64 to the signal processor 62.
  • the codebooks are stored in the program and data memory 60.
  • Block by block the encoded digital data consisting of indices and side information, such as scale factors and flag bits, are moved from the hard disk to buffers, in the main computer memory and then to the dual-ported memory in synchronism with signal processor use.
  • the magnitude indices are used to retrieve vector quantizer codewords forming a rendition of the magnitude block.
  • the individual bands are then scaled by the inverse band-by-band expansion factors.
  • phase quantizer indices are used to create a rendition of the phase block, from which the pseudorandom dither sequence is subtracted. (If the transient flag is set, the indices are decoded with the augmented quantizers.)
  • the completed magnitude and phase blocks are converted to real and imaginary representation.
  • This block is then inverse Fourier transformed using an inverse FFT program on the DSP to produce a rendition of the time domain block.
  • the preferred implementation uses one inverse FFT to simultaneously inverse transform a left and right block. Specifically, let L r and R r denote the real parts of the frequency domain blocks of the left and right channels, respectively, and let L i and R i denote the imaginary parts of the frequency domain blocks of the left and right channels, respectively.
  • time domain blocks are windowed, just as in the decoder, and the overlapping portions are added.
  • time domain blocks are then scaled according to the inverses of the time domain expansion factors.
  • the subblocks of the blocks are scaled according to the dynamic time domain expansion parameters.
  • the blocks are moved to the FIFO.
  • the data are moved to the digital-to-analog convertor 70 in serial.
  • the analog information is presented to buffers and filters 74 and thence to the power amplifiers and speakers of the audio system 76.
  • the decoding circuitry can be implemented on a single circuit board suitable for plugging into the backplane of a host computer.
  • the invention on the playback side can be implemented with a comparatively inexpensive microcomputer or personal computer having hard disk storage sufficient to hold the compressed data of the program material.
  • the playback system operates in real time using comparatively inexpensive components.
  • the invention can be used as a substitution for the mechanically complex jukebox changers now in use.
  • the only moving parts on the playback side employing the invention are the computer system power supply fan and hermetically sealed hard disk drive. Both of these components are highly reliable and easily replaced if damaged.
  • the resulting digital jukebox can be virtually maintenance free.
  • since there are no records or discs to become scratched or damaged during installation and use a considerable cost savings is achieved.

Abstract

The digitally sampled data is split into a plurality of subbands each covering a different frequency range. The subbands are each individually expanded to normalize the energy in each band and the subbands are converted by FFT to the frequency domain and the magnitude and phase portions are processed by different techniques based on psychoacoustic principles. Magnitude data are processed by tree structured vector processing to develop code books for each subband which are unique to each song. Phase data are uniformly quantized with dynamic bit allocation used to increase resolution on transient passages.

Description

This is a continuation of copending application(s) Ser. No. 07/582,715 filed on Sep. 13, 1990, now abandoned, which is a continuation-in-part of U.S. patent application Ser. No. 530,547, filed May 29, 1990, now abandoned entitled "Digital Audio Compression System."
BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates generally to digital audio systems and more particularly to a data compression system for substantially increasing the playing time of a given storage medium without significant degradation of sound quality.
In applications where access to a large library of digital audio is desired, the main problem is in the extraordinary data volume required to store high quality music. To place this problem in perspective, the standard compact disc player transfers digital audio data at a rate of approximately 5.3 megabytes per minute for one stereo channel. If one were to store all compact disc data for a 3 minute stereo selection on a hard disk of a computer, the selection would occupy 31.8 megabytes, or more than a 30 megabyte hard drive can hold. Even using a large 750 megabyte disk drive only about 1 hour and 10 minutes of music could be stored. That is far too little for an evening's entertainment or for digital jukebox purposes.
While there have been some advances made in the field of data compression, present-day data compression techniques have not adequately focused on human auditory perception. For example, many data compression algorithms are intended for compressing telemetry and telecommunications data and speech. There has heretofore been no practical data compression technique for reducing the data volume to allow meaningful playing times on relatively simple devices while preserving audio quality of recorded music.
The present invention uses a combination of source coding theory and theory of human auditory perception to greatly reduce the storage requirements for digital audio. In the presently preferred embodiment the 5.3 megabyte per minute compact disc data rate has been reduced to 0.42 megabytes per minute per channel. This reduction in data rate is achieved by an encoding and decoding system in which the more costly components are used on the encoding side, to allow simple and inexpensive equipment to be used on the decoding side. More specifically, the system is designed to permit decoding with a processor with limited arithmetic precision, for example, 16 bit fixed-point arithmetic. The present invention is thus well suited for music distribution systems, video game systems, consumer audio systems, digital jukeboxes and computer-controlled video/audio systems.
In accordance with one principle of the invention, a wideband digital audio signal is processed by transforming it into the frequency domain comprising data capable of being represented as complex numbers. A magnitude portion and a phase portion are extracted from the frequency domain data, with different quantization processes being performed on the magnitude and phase portions. After the quantization processes, the magnitude and phase data are stored as digital data on a data storage medium. In the presently preferred embodiment the magnitude portion is quantized using a vector quantization technique while the phase portion is quantized using uniform scalar quantization. By treating the magnitude and phase separately, the invention permits different quantization rules to be applied to each. This allows the use of vector quantization of the magnitudes and scalar quantization of the phases.
In accordance with another principle of this invention, expansion via scaling of bands of magnitude coefficients to a common power level assures that the noise produced by their quantization will be essentially inaudible. Using this technique it is possible to achieve effects similar to a more complex process of dynamically choosing the rate of the quantizer (in bits per coefficient) on the basis of perceptual masking calculations.
In accordance with another principle of the invention, different vector quantizers are designed for different bands of magnitude coefficients. This use of a plurality of vector quantizers insures better performance, because the quantizer is matched to the band being encoded, e.g., more highly correlated vectors in the low frequency bands than in the high frequency bands. Moreover, the vector quantizers may have different rates (in bits per magnitude coefficient) reflecting the fact that the human auditory system is more sensitive to errors in some frequency bands than others.
In implementing the presently preferred embodiment, a vector quantization codebook is developed uniquely for the magnitudes of each segment (of approximately 3 minutes in length) of the audio selection being recorded. The unique codebook further includes portions which are unique to each of the frequency bands. On playback, the codebook is first transmitted to and loaded in the decoding equipment, whereupon the magnitude portions of the encoded digital audio may be quickly and efficiently decoded to restore the original magnitude portions. In accordance with another principle of this invention, the codebooks are two-stage and tree-structured so that excellent quantization characteristics are obtained with greatly reduced complexity.
Neural conduction time in the human auditory system is somewhat indeterminate and therefore the phase of higher frequencies is of less importance than the phase of lower frequencies. This means that while the phase at low frequencies must be quantized with a large number of bits, the phase at higher frequencies may be quantized with substantially fewer bits. The presently preferred embodiment uses a detailed understanding of human auditory perception to allocate the minimum number of bits to the quantization of each phase, with higher frequencies receiving less or even zero bits. Moreover, it uses pseudorandom phase dither to eliminate the audible effects of correlations in the quantized phase errors.
Transform coding, such as described in this invention, is susceptible to "pre-echo," which may be heard when intervals of silence are followed by a transient such as a drumbeat, unless corrective measures are taken. In accordance with a principle of this invention, pre-echo is greatly reduced or eliminated by dividing blocks into subblocks, detecting the occurrence of transients that would cause pre-echoes and individually expanding via scaling the subblocks of those blocks containing transients, in a manner that exploits temporal masking in human auditory perception. In accordance with another principle of this invention, pre-echo is reduced or eliminated by dynamically augmenting the bit allocation to the phase quantizers in blocks containing transients.
The entire system is designed with low cost decoding in mind. Specifically, several steps of the encoding process are tailored to minimize the potential for truncation errors when a low cost, limited precision fixed-point arithmetic is used in the decoder. One such principle is the expansion via scaling of each block before transforming.
For a more complete understanding of the invention, its objects and advantages, reference may be had to the following specification and to the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an overall block diagram of the digital audio encoding system of the invention, showing both encoding and decoding stations;
FIG. 2 is a block diagram of the presently preferred encoding system;
FIG. 3 is a block diagram of a two-stage, tree-structured vector quantizer;
FIGS. 4a and 4b are flow diagrams of the encoding process; and
FIG. 5 is a schematic diagram of the presently preferred decoding system.
DESCRIPTION OF THE PREFERRED EMBODIMENT
While the invention is applicable to a wide range of different applications, it will be described in the context of a music distribution system, such as might be found in a digital jukebox. Referring to FIG. 1, an example of a distribution system is illustrated. A source of a standard audio or digital audio is fed at 10 to the encoding computer 12 in which the encoding processes of the invention are performed. The encoding computer can be one of a variety of commercially available engineering work stations or high performance microcomputers. In the presently preferred embodiment, the encoding process is performed by software run on the encoding computer, although hardware implementations may be employed in high volume manufacture and distribution applications. Using present-day workstation computer technology, the encoding process proceeds in slower than real time, the encoded output comprising a highly compressed manifestation of the original digital audio input. As computer technology improves, it is anticipated that the encoding process may someday be performed nearly in real time, making broadcast applications more convenient.
The encoded digital audio is stored on an archival storage medium such as a hard disk system, CD-ROM, digital tape, or the like. The data stored on archival storage 14 are supplied through a distribution system 16 to a plurality of local storage media 18, which may be hard disk drives associated with individual jukebox players located at distributed locations, e.g., throughout the country. Virtually any digital data distribution system can be used for this purpose. For example, the individual jukeboxes can receive the latest selections by ground data link (telephone) or satellite link, via modem or other suitable equipment. Distribution can also be effected using floppy diskette, digital tape or other removable data storage media. Computer network systems can also be employed.
Associated with each local storage medium is a decoding processor 20, which transforms the locally stored compressed data into a high quality rendition of the original digital audio signal, which is then converted into analog form by a digital-to-analog convertor, amplified and played through the speaker system (not shown).
In essence, the encoding and decoding processes are complementary. The original digital audio signal is highly compressed during encoding, to save storage space and reduce the data transfer rate. This highly compressed data is decompressed upon decoding, to restore a high quality audio output. Although there is some signal quality degradation resulting from the encoding/decoding process, considerable emphasis is placed on retaining high musical quality by masking signal degradation either under loud sounds or next to loud sounds in the time or frequency domains. Using the invention, it is possible to provide at least FM broadcast quality stereo reproduction of a compact disc quality source in an economical package.
Although the presently preferred embodiment has been optimized to provide FM broadcast quality reproduction for digital jukebox applications, the principles of the invention can be used in systems which are adapted to give both higher and lower quality reproduction, depending on the audio requirements and amount of data compression required. Accordingly, the presently preferred implementation described herein should not be viewed as a limitation of the inventive principles of their broader aspects.
An advantage of the data compression achieved is that a useful number of selections can be locally stored on a fast access hard disk, making it possible to find and play a musical selection without the time delay associated with mechanical disk changers. These advantages also extend to the broadcast industry to permit the disk jockey to respond immediately to a listener's telephone request for an obscure hit record. Computer controlled video game systems might also use the invention to inject high quality audio program material at the appropriate time as play action is in progress. Home entertainment and car stereo systems might also benefit from the invention.
Turning now to FIG. 2, a description of the presently preferred encoding scheme will be described. As stated, the presently preferred encoder is implemented in software, although certain portions or all of the encoder might be implemented in hardware, if desired.
FILTERING AND SAMPLING
The audio source material, depicted at 22, is low pass filtered through a 15 kilohertz low pass filter 24 and converted into a digital audio signal by a 16 bit analog-to-digital convertor 26, sampling at 32 kilohertz (just above the Nyquist rate) and producing 32,000 digital audio samples per second. Therefore, the audio samples are 16 bit integers ranging from -32,768 to +32,767. In this illustration, an analog audio source is assumed. However, if desired, the digital data from a digital audio source could be used instead, in which case the filter and analog-to-digital convertor would not be needed, but if the digital audio is not sampled at 32 kilohertz, it must be converted to such using standard techniques.
PREPROCESSING A. Segmenting
In the preprocessing block 28, the digital audio signal is first divided into segments of approximately 3 minutes. Each segment is separately encoded and decoded as described in the remainder of this document. The typical segment length (3 minutes) is a compromise. Shorter segments lead to higher quality audio reproductions at the expense of producing more bits.
B. Blocking
The segment is then divided into sequential, somewhat overlapping frames or blocks. In the presently preferred embodiment, each block consists of 1,024 audio samples (32 milliseconds) of which 64 are from the previous block. The blocks are then processed to minimize certain effects of transform coding such as pre-echo, post-echo, quantization errors and decoder truncation errors.
C. Subblocking, Transient Detection and Dynamic Time Domain Expansion of Subblocks
The next preprocessing step is designed to reduce or eliminate pre-echo, and endemic problem in transform coding. The quantization process in transform coding (such as that to be described later) introduces errors that are inverse transformed at the decoder and spread evenly throughout the block. The power in the errors introduced in this way tends to be proportional to the average power of the audio signal in the block. The result is that if a block contains a drumbeat or other transient, the beginning of the block will consist of very small values and the end will consist of very large values. Overall, the average power of the block will be fairly large, and thus the errors introduced by the quantization process will be large relative to the initial quiet portion of the block. Thus, the reproduction of the block contains an initial splash of quantization errors that is audible as a "pre-echo" of (i.e., before) the drumbeat or other transient that follows.
To reduce or eliminate pre-echo, transients are detected in blocks and two processing steps are applied to blocks containing transients. The first, to be described here, is dynamic time domain expansion of subblocks. The second, to be described later in the section on phase quantization, is to allocate more bits to the quantization of phase in those blocks where transients are detected.
To detect transients, each block is subdivided into 8 subblocks of 128 samples each. The average power, or simply power, in each subblock is then calculated. (The average power in a block is the sum of the square of the components of the block divided by the number of components in the block. ) If the ratio of the average power in a subblock to the average power in the previous subblock is greater than a certain threshold, a flag is set to indicate that a transient has been detected in the block and dynamic time domain expansion (to be described shortly) is undertaken. As the quantization process used in transform coding (such as that to be described later) introduces much more errors in quantizing high frequency audio signals than low frequency audio signals (because the ear is insensitive to high frequency errors), it is high frequency transients that contribute the most to pre-echo. Thus in the presently preferred embodiment, the block is high-pass filtered before transient detection. If x2n-1, x2n denotes the nth pair of samples in a block, then the high pass filter in the presently preferred embodiment produces
y.sub.2n-1 =x.sub.2n-1 -a.sub.n
y.sub.2n =x.sub.2n -a.sub.n
in response to x2n-1 and x2n, where
a.sub.n =(x.sub.2n-1 +x.sub.2n +x.sub.2n+1 +x.sub.2n+2)/4
is the average of x2n-1, x2n and the two subsequent samples. The ratio threshold is set at 2. Finally, to avoid spurious detection of transients, a transient is detected only if the RMS power in the subblock (i.e., the square root of the average power) is greater than some specified threshold, presently set at 50 (on the scale of -32,768 to +32,767).
When a transient is detected in a block, the (unfiltered) subblocks occurring before the transient are expanded by scaling with various expansion factors. Subblocks occurring well before the transient are scaled by larger factors than those occurring just before the transient. The subblock containing the transient and subsequent subblocks are not scaled. The transition from large expansion factors to smaller occurs gradually or in a stepwise diminishing fashion. The reason for the diminishing set of expansion factors is the backward masking phenomenon in human auditory perception, wherein errors occurring just before a transient are less noticeable than those occurring well before the transient. The small expansion factor applied to the subblocks just before the transient permits more quantization noise (introduced in subsequent steps) than the larger expansion factor applied to segments well before the transient.
In the presently preferred embodiment, if a transient is detected in the nth subblock, 1≦n≦8, (that is, the ratio of the power in the nth filtered subblock to that of the previous filtered subblock is sufficiently large and its RMS power is sufficiently large), then the ratio R of the power in the last 9-n subblocks (unfiltered) to that of the first n-1 subblocks (unfiltered) is computed. The expansion factors g1, . . . ,gn-1 for the first n-1 subblocks are as follows:
If R>13, then gi =(R+1)/i, for i=1, . . . ,n-1.
If 1<R<13, then gi =R+1, for i= , . . . ,n-1.
If R<2, then gi =1.
Note that although no expansion is actually done in the last case, the transient flag is still set and used to increase the bit allocation for phase quantization. Note further that the expansion factors are between 1 and 2. Finally, note that the high pass filter is used only to detect transients and is not part of the final processing of the data; i.e., it is the unfiltered data that is expanded. To reduce the possibility of truncation errors in the fixed-point arithmetic of the decoder, in the presently preferred embodiment, R is rounded to a number whose inverse can be exactly used by the decoder as a multiplication factor. The factor R, the transient flag and the index n are all recorded as part of the encoded digital data. In the presently preferred embodiment 12 bits are used to describe these. In segments encoded to date, no more than 1/6 of the blocks contained transients, meaning that at most 0.002 bits per sample were required for these parameters.
Although not included in the presently preferred embodiment, the same approach can be used to reduce post-echo. In this case we would detect transients from loud to quiet, as in the end of a drumbeat. In blocks where a loud to quiet transient were present, scaling of subblocks after the transient would take place with those subblocks farthest from the transient being scaled up more than those closer to the transient.
The dynamic time domain expansion described here is useful for any block quantization process (e.g., transform coding) that would otherwise introduce pre-echo.
D. Time Domain Expansion of Blocks
The next preprocessing step is time domain expansion on a block-by-block basis, wherein each block is scaled so its maximum equals a specified value. The specified maximum value is chosen to be the largest value that the decoder can represent or 70.7% of this value if two stereo channels are to be simultaneously decoded in the manner described later. The purpose of this time domain expansion is to amplify quiet blocks so that truncation errors due to decoding, especially due to inverse Fourier transforming, with limited precision fixed-point arithmetic will be small in comparison. It also reduces, in quiet blocks, the effects of quantization errors introduced in subsequent steps. The expansion factor for each block is recorded as part of the encoded digital data. In the presently preferred embodiment the expansion factors are limited to being powers of 2 so that the inverse expansion at the decoder can be done rapidly and exactly with fixed-point arithmetic. The result is that in the presently preferred embodiment 0.004 bits/sample are used for this purpose.
This time domain expansion is useful for any digital audio encoding process that divides the audio signal into blocks.
E. Windowing
The window is a square root Hanning window in the 64 sample overlap area at the beginning of the block and also at the end of the block; i.e., the window allows 1/16th of its size for overlap add. The Hanning window gradually diminishes the amplitude of one block in the overlap while increasing the amplitude of the next block, blending the two together to avoid transients due to different coding parameters on either side of the overlap. A square root Hanning window is also applied at the decoder as well. To eliminate the possibility of truncation errors when decoding with fixed-point arithmetic, in the presently preferred embodiment, the window values are rounded to values exactly representable by fixed-point arithmetic.
TRANSFORMING BLOCKS
After preprocessing, each data block or frame is transformed with an N point Fast Fourier Transform (FFT) 32 into a set of N complex numbers, called the transformed block, where N matches the size of blocks (N=1,024 in the presently preferred embodiment). In effect, the FFT converts these data blocks from the time domain to the frequency domain, with the ith complex number representing the frequency component at frequency fS (i-1)/N, 1≦i≦N, where fS denotes the sampling frequency (32 kilo-hertz in the presently preferred embodiment). Any suitable computer implemented Fast Fourier Transform algorithm can be used for this purpose. Each complex number in the transformed block is then converted into 2 real numbers, a magnitude and phase. According to Fourier transform theory, the last N/2 magnitudes and the last N/2 phases are redundant, so the first N/2 magnitudes and the first N/2 phases will be referred to as the magnitude block and the phase block, respectively. Throughout the remainder it will be assumed for concreteness that N=1,024.
In accordance with the invention, the magnitude and phase blocks are handled by different processes. Accordingly, processing of the magnitude portion proceeds along branch 34 and the phase portion along branch 36. The DC and Nyquist points are also separated and quantized along branch 38 using a 16 bit uniform scalar quantizer
OVERVIEW OF QUANTIZING MAGNITUDES AND PHASES
After the magnitude and phase portions are separated, the magnitude block is divided into subblocks called bands and each band is expanded via scaling. This process, which is referred to as band-by-band expansion (BBE) is depicted at BBE block 42. This is done for all blocks of the entire segment. The expansion coefficients for each band of each block are recorded as part of the encoded digital data.
For each band, a vector quantization codebook is designed. Such codebooks may be designed to work for any segment of audio, or as in the preferred embodiment described here, they may be designed for the specific audio segment being encoded. In this case, the codebook for a given band is designed by training on the sequence formed by concatenating the elements of that band from each magnitude block of the digital audio segment being encoded (approximately 3 minutes in the preferred embodiment), and the codebooks themselves are recorded as part of the encoded digital data. The vector quantizer design block is pictured at 43.
After the band codebooks have been designed, the magnitude blocks are processed sequentially. Each band of a given magnitude block is further divided into vectors (with vector lengths equal to the dimension of the codebook for that band), which are then assigned an index from the codebook for that band. The vector quantizer block is depicted at 44. The index for each vector of each band and block are recorded as part of the encoded digital data.
Each term in a phase block is quantized with a uniform scalar quantizer 46. The indices produced by this quantization process are recorded as part of the encoded digital data. The step size of the quantizer depends on the frequency corresponding to the given phase term, as described later. In addition, pseudorandom phase dither is added to each phase, as described later. The bits produced by the phase quantizers are included as part of the encoded digital data.
The details of the magnitude and phase quantization are described below.
MAGNITUDE QUANTIZATION A. Band-By-Band Expansion (BBE)
Band-by-band expansion is used to reduce the audibility of the errors introduced by vector quantization through exploitation of the spectral masking properties of the human ear. The background is the following. When applied to a block of magnitudes, a typical quantization process (for example, the vector quantization process described later) spread quantization errors evenly throughout the block. However, the ear is less sensitive to errors in magnitudes corresponding to frequencies in the neighborhood of which the magnitudes are generally large. Indeed, at any frequency there is an auditory "critical band" surrounding that frequency that errors at this frequency are inaudible if the error power within the critical band is a sufficiently small fraction of the power in the magnitudes within the critical band. The width of these critical bands has been found to be proportional to the logarithm of the center frequency. This critical band theory indicates that, unless corrective measures are taken, the quantization noise will be audible in bands where the magnitudes are small.
Band-by-band expansion is a very simple and effective way to take corrective measures, thereby exploiting the spectral masking effect. In band-by-band expansion each magnitude block is divided into subblocks, called bands, of adjacent terms; each band is then expanded via scaling to some common power before quantizing. The result is that the quantization errors are equally inaudible in all bands. More specifically, each band is expanded as much as possible, with the constraints that all bands have the same power after expansion and that no expanded magnitude term exceed the maximum value representable at the decoder. In addition, the expansion factors are rounded to numbers whose inverse can be exactly used by the decoder as multiplication factors. The expansion factors are recorded as part of the digital data.
It is important to choose the expansion band widths appropriately. Generally narrower bands give a better spectral masking effect but more bits are required to describe the expansion factors. However, the critical band theory indicates that there is no need to make the bands narrower than the critical bands. Indeed it was found that 12 bands each roughly twice the width of the critical band at its frequency lead to the most efficient implementation. See Table I for the preferred sets of bands. For example, band 9 contains the 40 magnitudes from 100th to the 139th which correspond to frequencies from 3,125 hz to 4,343.75 hz. The encoding of the expansion factors for these bands requires 0.05 bits/sample.
The band-by-band expansion just described is useful not only when processing the magnitudes from a Fourier transform but also when processing the transform coefficients produced by any transform whose coefficients are interpretable as corresponding to frequencies, for example, the discrete cosine transformation.
              TABLE I                                                     
______________________________________                                    
A list of the subband lower edges and number of coefficients              
that fall with each as used in this coding scheme.                        
Band No.                                                                  
        No. of Coefficients                                               
                       Lower Edge of Band in hz                           
______________________________________                                    
1        3             31.25                                              
2        8             125                                                
3        8             375                                                
4        8             625                                                
5       12             875                                                
6       16             1250                                               
7       20             1750                                               
8       24             2375                                               
9       40             3125                                               
10      64             4375                                               
11      100            6375                                               
12      208            9500                                               
______________________________________                                    
VECTOR QUANTIZATION OF EXPANDED MAGNITUDES
After the magnitudes are band-by-band expanded, the expanded data is vector quantized using a distinct codebook for each band, and preferably each codebook is uniquely designed for the audio segment. Codebook design is discussed more fully later.
Using a separate quantizer for each band, vector quantization performance is enhanced, because the statistical characteristics of the magnitude vectors within a given band are quite similar, but may be quite different than the characteristics of other bands. For example, lower frequency bands are more correlated than higher frequency bands.
Another advantage of designing a separate quantizer for each band is that they may have different encoding rates (in bits/coefficient) to reflect perceptual criteria for preferentially doing finer quantization of certain bands than others, thereby reducing the total rate while maintaining high quality.
Note that the bands adopted for vector quantization need not be the same as the bands adopted for band-by-band expansion. The more narrow the band (and, consequently, the more bands), the better the performance of the vector quantizers. However, the decoder becomes more complex, and if the codebooks are individually tailored to the segment, more bits are needed to describe the codebooks. Since the statistics of the vectors within a given band of the band-by-band expansion were found to be similar, in the preferred embodiment, these bands are used for the vector quantization as well.
A tree-structured vector quantizer (TSVQ) is preferred for each band, as this gives excellent performance with greatly reduced complexity (reduced design time, reduced quantization time and reduced decoder storage). Thus the present implementation has 12 TSVQs. TSVQ is described in more detail later.
The rate of each TSVQ (in bits per magnitude coefficient) is controlled by varying the "depth" of the tree. Specifically, the rate of a TSVQ is the depth divided by the block length. Rate selection for the quantizers will be explained later. To keep the size of the codebooks small, two-stage, tree-structured vector quantization (TSVQ) is employed wherever the desired rate exceeds 2 bits per magnitude coefficient. The codebook dimension (i.e., the vector length) is fixed at 4, with the exception of the first band having a dimension of 3 since there are only 3 coefficients.
There are two conflicting factors affecting the selection of the dimension of the vector quantizers. On one hand, the larger the dimension, the better vector quantization performs. On the other hand, for a given encoding rate the codebooks size increases exponentially with dimension, thus dramatically increasing the complexity of encoding and decoding (codebooks design time, quantization time and decoder storage). For instance, at dimension 8 and rate 2, one codebook is 1 megabyte. For an encoder using 12 separate bands, a vector length 8 and rate 2 implementation would require 12 megabytes of codebooks, which is considered impractically large for this application.
Consequently, codebook dimension is selected small enough so that the codebooks will be of acceptable size but large enough so that vector quantization is efficient. From simulation studies it was learned that codebook dimension 4 provided acceptable quantizer performance. Furthermore, at rate 2 the codebook size is 2K bytes, which is reasonable.
The rate of each vector quantizer (i.e., the tree depth for each TSVQ) was selected for the presently preferred embodiment based on simulation studies under the broad guideline that at higher frequencies, lower rates can be used. That is to say, at higher frequencies, more noise can be tolerated. This is based on psychoacoustic findings. Table II sets forth the presently preferred tree depths. Additionally, since the audio signal is filtered to 15 kilohertz, the last 32 magnitudes are negligible and are not quantized at all. (The decoder assumes them to be zero). With these rate allocations, the code word indices require a total of 0.96 bits/sample and the description of the codebooks requires 0.03 bits/sample, assuming a 3 minute segment.
              TABLE II                                                    
______________________________________                                    
Dimensions, tree depths and rate allocations                              
for magnitude TSVQs as used in this code.                                 
                                    Rate Alloca-                          
Band  VQ        First      Second   tion (Bits/                           
No.   Dimension Stage Depth                                               
                           Stage Depth                                    
                                    Coefficient)                          
______________________________________                                    
 1    3         6          3        3                                     
2-9   4         8          4        3                                     
10-11 4         8          --       2                                     
12    4         4          --       1                                     
______________________________________                                    
The values shown in Table II are a set that yields acceptable performance. However, these values may be changed, if desired, to suit particular audio requirements. Moreover, after auditioning the encoded and decoded segment it is possible to change the parameters to achieve certain rate or quality goals. As shown in Table II, two-stage TSVQ is employed. The primary reason is to reduce the size of codebooks. FIG. 3 shows a block diagram of a two-stage TSVQ.
A secondary advantage of the band-by-band expansion is that it makes the magnitude vectors in a given band more homogeneous, thus making it possible for the codebook design algorithm, described later, to produce a better codebook for the given band.
The above-described encoding process, which employs two-stage, tree-structured vector quantization, is based on codebook which is uniquely designed for each segment. This codebook, in turn, comprises a set of codebooks, one uniquely designed for each of the 12 subbands. This approach is presently preferred, since it gives excellent performance and since the codebooks can be downloaded to the playback equipment relatively quickly, prior to the music playback. In the alternative, a universal codebook could be developed (i.e., one for a segment, or even one for all segments), although it is anticipated that such a codebook would be larger than a segment-specific codebook and this might require greater memory capacity in the decoding and playback equipment. It is also possible to use a universal codebook with a mask unique to a particular segment to allow only the necessary portions of the universal codebook to be uploaded prior to music playback.
CODEBOOK DESIGN
The presently preferred codebook design process is performed using the data (or representative sampling of the data) of the band and segment for which the codebook is being developed. The codebook development process is implemented by employing the techniques generally described in IEEE Transaction on Information Theory, "Speech Coding Based Upon Vector Quantization, " by A. Buzo et al., Vol. 28, pp 562-574, October 1980. See also IEEE ASSP Magazine, "Vector Quantization, " by R. M. Gray, Vol 1, pp 4-29, April 1984. We have significantly enhanced the algorithms described in the foregoing references which will be described below. The enhancements greatly speed up the codebook development process. Before discussing these enhancements, a brief description of the general principals of tree-structured vector quantization (TSVQ) will be given.
TSVQ, like other forms of vector quantization, is a method of partitioning the input vector space into N, a power of 2, disjoint subsets and representing each subset with a code vector and an index. The partitioning works in a hierarchial way following a certain binary tree structure.
A binary tree of depth d is a data structure consisting of one "root node," and 2d -2 "internal nodes" and 2d "terminal nodes." The root node and each internal node have two other nodes (either internal or terminal) as its "children." The children of the root node are said to be at depth 1, the children of these are at depth 2, and so on. Consequently, there are 2e nodes at depth e. The 2d children of the nodes at depth d-1 are the terminal nodes.
In a TSVQ of dimension k and depth d, there is a binary tree of depth d, with one k-dimensional testvector associated with the root node and each internal node, and with a codevector associated with each terminal node. Designing a TSVQ consists of choosing the testvectors and codevectors. The set of codevectors constitutes the codebook of the TSVQ. Before describing the method for designing the codebook and our enhancements thereto, we describe how the encoding via partitioning is accomplished with a given set of testvectors and codevectors.
Given a k-dimensional input data vector x to be quantized, the Euclidean distance is computed from x to each of the testvectors associated with the two children of the root node. Then the children at depth 2 of which ever child at depth 1 give smaller distance are each compared to the data vector x to find the closest. In a similar manner, at each depth a pair of children are compared to x and after a total of d-1 such comparisons, a testvector at depth d-1 is found. Then x is compared to the two codevectors that are children of this testvector and the closest of these becomes the codevector that represents the data vector. Its index among the set of codevectors becomes part of the encoded digital data. The decoder then uses the index to reproduce the codevector as a rendition of the input data vector.
A principal advantage of TSVQ is that this partitioning method requires substantially fewer arithmetic operations than direct searching of the codebook for the closest codevector. Although the TSVQ method does not necessarily find the closest codevector in the codebook, it has been found that it generally finds a good one. Note that the testvectors are needed for encoding only, whereas the codevectors are needed for both encoding and decoding, and accordingly, in the preferred embodiment the codevectors (codebook) are encoded as part of the digital data.
Designing a TSVQ consists of selecting the testvectors and codevectors. As with any vector quantization process, the success of TSVQ depends on the quality of this design. For the invention described here, a very important advantage of TSVQ is that good TSVQ codebooks can be designed much more rapidly than otherwise unstructured codebooks. (This is important because designing the codebooks is part of the process of encoding a segment.) The design procedure and our enhancements will now be described.
The design is based on a training set, consisting of a sequence of k-dimensional training vectors from the data to be encoded. (As mentioned earlier, in the preferred embodiment, one codebook is designed for each band of each segment, so the training set is formed by concatenating the elements of that band from each magnitude block of the digital audio segment being encoded.)
The TSVQ algorithm designs the testvectors one depth at a time, starting at the root node. The testvector associated with the root node is the "centroid" of the entire training set, where the centroid c of a set S=(x1, . . . xm) is the vector average of the xi s; i.e., ##EQU1## Next to find the child testvectors of c (at depth 1), one begins by selecting an initial candidate c1 and c2, for each. Usually these are c times (1-ε) and c times (1-ε) where ε is a small number, say 0.0005. Then the training set is partitioned into two subsets, according to which of the candidate testvectors is closest. The candidate testvectors are then replaced by the centroids of these subsets, forming new candidate testvectors c'1 and c'2. Specifically c1 and c2 are replaced respectively, by ##EQU2## where m1 is the number of testvectors closer to c1 than c2 and m2 is the number of testvectors closer to c2 than c1. Note that m1 plus m2 is m.
Then the partitioning and replacement by centroids is repeated over and over, until no significant improvement is found. Improvement is measured by comparing the distortion that results from using the pair of old testvectors as representatives of the training set to the distorting resulting from using the new pair. Specifically, the distortion from using a pair c1, c2 is ##EQU3## where ∥x-c∥ denotes the Euclidean distance between the two vectors x and c.
Note that determining whether a testvector xi is closer to c1 or c2 is most quickly done by taking the dot product of xi with (c1 -c2) and comparing it to the threshold (∥c12 -∥c22)/2.
After designing the two testvectors c1 and c2 at depth 1, the training set is split in half, with those training vectors that are closest to c1 forming a training set associated with it and the rest forming a training set associated with c2. For each of these two training sets, the above iterative binary splitting procedure is applied, obtaining the two children for c1 and then the two children for c2, with each child getting a subset of the original training vectors. This results in the four testvectors at depth 2. Applying the above iterative binary splitting procedure to these yields the testvectors at depth 3, and so on, until applying the iterative binary splitting to the training sets associated with the testvectors at depth d-1 yields the 2d codevectors.
As described above, the TSVQ design procedure consists of performing the iterative binary splitting at each internal node. At any given node, in each iteration, a pair of centroids is calculated using Equations 1 and 2 and the resulting new distortion is calculated, each using Equation 3. These calculations occupy the major part of the TSVQ design program execution time. With our enhancements, the execution time for these calculations is reduced at least by 75%. These savings come by recognizing the c'2 can be computed from c'1 and the "parent" testvector c, whose children we seek, via ##EQU4## where m is the number of training vectors associated with the parent node c.
Similar savings come by recognizing that the second term of Equation 3, denoted D2, may be computed from the first term, denoted D1, and the previously computed distortion associated with the parent node, denoted D, via
D.sub.2 =D-D.sub.1 -m.sub.1 ∥c.sub.1 -c∥.sup.2 -(m-m.sub.1)∥c.sub.2 -c∥.sup.2
These shortcuts enable the bypassing of many steps, and consequently save considerable execution time in designing the codebooks.
Another enhancement of our program involves sorting the training set. As discussed above, the program works by designing two testvectors at a time, working with the training set associated with their parent. This parent training set is a subset of the original large training set. And except for parents at very small depths, it is a small subset of the original training set, whose members are scattered widely through the original training set. Since the entire training set is too large to reside in fast computer memory, the bulk of it generally resides in a disk. When a training vector is needed for the above-mentioned calculations, it is not likely to be in fast memory. So the computer operating system brings it in from the disk. Although the operating systems also bring in a block of neighboring training vectors, the needed ones are scattered so widely, that no many are likely to be in the retrieved block. Therefore the operating system must make frequent, time-consuming disk accesses when executing just one iteration of the above process. To alleviate this problem, in our enhancement, whenever the design of two new testvectors has been completed, the training vectors associated with their parent are sorted so the training vectors associated with each child are stored contiguously. Thus, when one training vector is accessed, the operating system also retrieves a block of other needed training vectors. This greatly reduces the amount of time spent on disk accesses.
PHASE QUANTIZATION A. Bit Allocation for Uniform Scalar Phase Quantizers
Turning now to the phase branch 36 (FIG. 2) a description of the phase processing will be given. In our experience, phase is very important in the low frequency range, but decreases in importance to where it becomes virtually unimportant at sufficiently high frequencies. From psychoacoustics studies, we have learned that the human ear cannot resolve time differences less than a certain threshold τo. threshold can be converted into a tolerance on error for each phase term in the block. Let fS be the sampling frequency, let fi =i fS /N be the frequency corresponding to the ith phase, where N is the number of samples in a block (1,024 in the presently preferred embodiment). Then the tolerable error in the ith phase is
δφ.sub.i =2 πf.sub.i τ.sub.o f.sub.S /N   (Equation 4)
It is seen from Equation 4 that when τo fi =1, then δφ=2 π. It is at this point that phase becomes unimportant and can be neglected. Let fo =1/τo be the frequency at which this holds. From psychoacoustic tables the parameter τo is found to be on the order of 100 μsec. Therefore, fo =10 kHz which corresponds to the 320th phase. Thus the phase from 320 to 512 are largely irrelevant.
As mentioned previously, the presently preferred embodiment uses uniform scalar quantization of the phase. Since adjacent phase values are uniformly distributed between 0 and 2π and since adjacent phases are uncorrelated, vector quantization would perform no better, but would be much more complex.
From Equation 4, more error in phase can be tolerated at higher frequencies than at low. From it, we determine that the step size of the uniform quantizer should vary with frequency. Specifically, the step size of the quantizer for the ith phase should be no larger than δφi as given in Equation 4. Thus the number of levels should be at least 2π/δφi and the number of bits allocated to the quantizer should be the logarithm base 2 of the number of levels, rounded up to an integer. That is the number of bits ni to be allocated to the ith phase is
n.sub.i =ceil(log.sub.2 [2π/δφ.sub.i ])       (Equation 5)
where ceil(x) is the function returning the smallest integer no less than x. (A uniform scalar quantizer allocated n bits has 2n levels, spaced apart by 2π/2n. Based on this principle, the bit allocations used in the preferred embodiment are shown in Table III. Note that to simplify the decoder, the bit allocations are the same for all phases in a band. Note also that the phases in band 12 are allocated 0 bits. This means that the decoder assumes they are zero, minus the pseudorandom phase dither described below. With these bit allocations, the indices for the uniform scalar phase quantizers require a total of 0.66 bits/sample.
              TABLE III                                                   
______________________________________                                    
Bit Allocations for the Uniform Scalar Phase Quantizers                   
             Normal Bit                                                   
                       Augmented                                          
Band No.     Allocation                                                   
                       Bit Allocation                                     
______________________________________                                    
1            10        10                                                 
2            8         8                                                  
3            6         6                                                  
4            5         5                                                  
5            4         4                                                  
6            3         3                                                  
7            3         3                                                  
8            2         2                                                  
9            2         2                                                  
10           1         2                                                  
11           1         2                                                  
12           0         0                                                  
______________________________________                                    
B. Pseudorandom Phase Dither
If the first quantization level of each of the uniform scalar quantizers described above represents the same phase value, say phase 0, then the quantization errors introduced by the various quantizers will be highly correlated and such correlated errors will be distinctly audible. The audibility of said errors is due to the fact that the effect of the inverse transform operating on the quantized magnitudes and phases is to produce sinusoids at the various frequencies with related phases. However, these phase relationship are abruptly interrupted at block boundaries, and this can be heard as a crackling noise. The audible effects of this phenomenon can be eliminated by pseudorandomly staggering the first levels of the uniform scalar quantizers.
In the preferred embodiment, each uniform quantizer is designed in some arbitrary way, and the necessary staggering is assured by adding pseudorandom dither before quantizing, and by subtracting the pseudorandom dither when decoding. Specifically, a pseudorandom sequence of phases Ψ1, Ψ2, . . . , Ψ512, with one term for each phase and with each term between 0 and 2π, is stored at both the encoder and the decoder. Before quantizing the ith phase φi, the term Ψi is added to it, the sum is quantized and an index is produced. At the decoder, Ψi is subtracted from the phase value corresponding to the index. We have found that this approach satisfactorily eliminates the audibility of these phase errors. Note that a phase coefficient φi corresponding to a large frequency that is allocated zero bits in Table III is not actually quantized. Rather the decoder merely reproduces it as -Ψi.
C. Dynamic Augmentation of Phase Bit Allocations
As previously stated, a greater number of bits are allocated to the phase quantizers in blocks containing transients such as drumbeats. And as described earlier, the purpose of this augmented bit allocation is to reduce the pre-echo. Increasing the bit allocations of the phase quantizers in a block containing a transient reduces the overall quantization error, thereby reducing the errors that may be spread to the quiet initial segment. If allowed to happen, such errors would be heard as pre-echo. Since it is the high frequency phases that are given the smallest bit allocations (see Table III), it is these phases that are the source of the most quantization errors, and it is these phases whose quantizers need to have their rates augmented. The method for detecting frames was described earlier, and as stated there, the effect of said method is to set a flag indicating whether the frame contains a transient. If so, the bit allocations for the phase quantizers are increased. The augmented phase allocations to be used in this case are shown in Table III. For example, in band 10 the bit allocations for the phases are increased from 1 to 2 bits; i.e., they have 4 rather than 2 levels. Among all segments encoded to date, no more than 1/6 of the blocks all required augmented bit allocations. Correspondingly, no more than 0.028 bits per sample has been needed for the augmentation.
SUMMARY OF THE ENCODING
Turning now to FIGS. 4a and 4b, the encoding process is summarized. As previously explained, the magnitude and phase portions are treated differently. Accordingly, the flow diagram of FIG. 4a shows a magnitude path A and a phase path B which correspond to the magnitude branch A and phase branch B of FIG. 4b.
With the presently preferred embodiment, the encoded digital data includes bits representing the dynamic time expansion parameters (at most 0.002 bits/sample), the time domain expansion factors (0.004 bits/sample), the DC magnitude coefficient (0.017 bits/sample), the band-by-band expansion factors (0.05 bits/sample), the magnitude vector quantization codebooks (0.031 bits/sample), magnitude vector quantizer indices (0.96 bits/sample), the baseline phase quantizer indices (0.66 bits/sample), and the augmented bits for the phase quantizers (at most 0.028 bits/sample) for a total of at most 1.75 bits/sample. With the 32,000 samples/sec sampling rate, this induces 56,000 bits/second, or 0.42 megabytes/minute
However, it is possible to change the parameters of the quantizers (for example to increase or decrease the allocations to the magnitude and phase quantizers) to achieve other rate or quality goals.
DECODING
The decoding process, as stated earlier, is essentially the complement of the encoding process. FIG. 5 depicts the presently preferred decoding circuit, which simultaneously decodes left and right channels of a segment for which both channels have been encoded using the method previously described. This decoding circuit will now be described. Referring to FIG. 5, the decoding circuit comprises random access program and data memory 60, a digital signal processor chip 62 which may be one of several available. The digital signal processor is coupled through a block of dual-ported memory 64 to the computer bus 66 of a computer system which controls the local storage medium on which the compressed data has been stored. A first in, first out (FIFO) memory block 68 communicates with signal processor 62 to transfer the decoded, decompressed data to the digital-to-analog convertor (DAC) 70. The digital-to-analog convertor as well as the FIFO memory 68 are synchronized by clock 72. The digital-to-analog convertor provides stereo outputs L and R which are applied to a buffer and filter circuit 74. The output of circuit 74 feeds the conventional analog audio system 76.
At the start of playback of a segment, the magnitude vector quantization codebooks for the individual subbands are moved from the hard disk of the main computer via the computer bus 66 and dual-ported memory 64 to the signal processor 62. The codebooks are stored in the program and data memory 60. Block by block, the encoded digital data consisting of indices and side information, such as scale factors and flag bits, are moved from the hard disk to buffers, in the main computer memory and then to the dual-ported memory in synchronism with signal processor use.
Using the signal processor 62, the magnitude indices are used to retrieve vector quantizer codewords forming a rendition of the magnitude block. The individual bands are then scaled by the inverse band-by-band expansion factors.
Similarly, the phase quantizer indices are used to create a rendition of the phase block, from which the pseudorandom dither sequence is subtracted. (If the transient flag is set, the indices are decoded with the augmented quantizers.)
The completed magnitude and phase blocks are converted to real and imaginary representation. This block is then inverse Fourier transformed using an inverse FFT program on the DSP to produce a rendition of the time domain block. The preferred implementation uses one inverse FFT to simultaneously inverse transform a left and right block. Specifically, let Lr and Rr denote the real parts of the frequency domain blocks of the left and right channels, respectively, and let Li and Ri denote the imaginary parts of the frequency domain blocks of the left and right channels, respectively. Then applying the inverse FFT to the complex valued block whose real part is Lr -Rr and whose imaginary part is Li +Rr yields a complex time domain block whose real part is the rendition of the left time domain block and whose imaginary part is the rendition of the right time domain block.
These time domain blocks are windowed, just as in the decoder, and the overlapping portions are added.
The time domain blocks are then scaled according to the inverses of the time domain expansion factors.
If the transient flag is set, the subblocks of the blocks are scaled according to the dynamic time domain expansion parameters.
Finally, the blocks are moved to the FIFO. On demand from the master clock 72 the data are moved to the digital-to-analog convertor 70 in serial. The analog information is presented to buffers and filters 74 and thence to the power amplifiers and speakers of the audio system 76.
The decoding circuitry can be implemented on a single circuit board suitable for plugging into the backplane of a host computer. Thus, the invention on the playback side can be implemented with a comparatively inexpensive microcomputer or personal computer having hard disk storage sufficient to hold the compressed data of the program material. The playback system operates in real time using comparatively inexpensive components. Thus the invention can be used as a substitution for the mechanically complex jukebox changers now in use. In contrast with conventional mechanical jukeboxes, the only moving parts on the playback side employing the invention are the computer system power supply fan and hermetically sealed hard disk drive. Both of these components are highly reliable and easily replaced if damaged. Thus the resulting digital jukebox can be virtually maintenance free. Moreover, since there are no records or discs to become scratched or damaged during installation and use, a considerable cost savings is achieved.
While the invention has been described in connection with a presently preferred embodiment suitable for jukebox music distribution, the invention is not limited to this application and may be modified without departing from the spirit of the invention as set forth in the appended claims.

Claims (33)

What is claimed is:
1. A method of processing a wideband digital audio signal and for storing the processed signal on a data storage medium comprising:
dividing said wideband digital audio signal into signal blocks each block having a maximum signal value;
scaling each signal block by a constant value selected such that the maximum absolute signal value in each block equals a predetermined value within a preset range and setting a scale factor equal to said constant value for each signal block;
transforming each said signal block into transform blocks comprising a plurality of transform values representative of the audio signal in its associated signal block;
quantizing said transform blocks; and
recording said quantized transform blocks and said scale factors as digital data on the data storage medium.
2. The method of claim 1 wherein said digital audio signal is divided into overlapping signal blocks.
3. The method of claim 1 wherein said digital audio signal is divided into non-overlapping signal blocks.
4. The method of claim 1 further comprising reproducing said quantized transform blocks from said recorded digital data;
inverse transforming and inverse scaling said quantized transform blocks into decoded signal blocks; and
recombining said decoded signal blocks into a reproduction of said wideband digital audio signal.
5. The method of claim 4 wherein said decoded signal blocks are represented by a predetermined arithmetic precision and wherein said predetermined value of said scaling step is selected based upon said arithmetic precision.
6. A method of processing a wideband digital audio signal and for storing the processed signal on a data storage medium comprising:
dividing said wideband digital audio signal into signal blocks;
dividing said signal blocks into a plurality of subblocks;
detecting transients in said subblocks and setting a transient flag associated with said signal block to a predetermined value if a transient greater than a predetermined threshold is detected;
if the transient flag is set to said predetermined value, scaling each subblock in accordance with transients detected to produce processed signal blocks and generating a scale factor for each subblock;
said scaling step further comprising the step of scaling at least one subblock occurring a predetermined time before a detected transient differently than scaling the subblock containing the transient;
transforming said processed signal blocks into transform blocks each comprising a plurality of transform values representative of the audio signal in its associated block;
quantizing said transform blocks; and
recording said quantized transform blocks, transient flags and scale factors as digital data on the data storage medium.
7. The method of claim 6 wherein stepwise scaling is used to scale said adjacent subblocks to effect a transition from the scaling applied to the pre-transient subblocks to the scaling applied to the subblock containing the detected transient.
8. The method of claim 6 wherein said digital audio signal is divided into overlapping signal blocks.
9. The method of claim 6 wherein said digital audio signal is divided into non-overlapping signal blocks.
10. The method of claim 6 further comprising reproducing said quantized transform blocks from said recorded digital data;
inverse transforming and inverse scaling said quantized transform blocks into decoded signal blocks; and
recombining said decoded signal blocks into a reproduction of said wideband digital audio signal.
11. A method of processing a wideband digital audio signal and for storing the processed signal on a data storage medium comprising;
dividing said wideband digital audio signal into signal blocks;
detecting if a transient occurs in each signal block and setting a transient flag to a predetermined value when a transient is detected;
when said transient flag equals said predetermined value, dividing said signal blocks into a plurality of subblocks;
scaling each subblock in accordance with transients detected to produce processed signal blocks and generating a scale factor for each subblock;
said scaling step further comprising the step of scaling at least one subblock occurring a predetermined time before a detected transient differently than scaling the subblock containing the transient;
transforming said processed signal blocks into transform blocks each comprising a plurality of transform values representative of the magnitude and phase of the audio signal as a function of frequency in its associated block;
quantizing said transform blocks; and
recording said quantized transform blocks, transient flags and scale factors as digital data on a data storage medium.
12. A method of processing a wideband digital audio signal and for storing the processed signal on a dam storage medium comprising;
dividing said wideband digital audio signal into signal blocks;
Fourier transforming said signal blocks into transformed blocks representative of the magnitude and phase of the audio signal in its associated block as a function of frequency;
extracting from said transformed blocks magnitude data blocks and phase data blocks as a function of frequency;
grouping said magnitude data blocks and phase data blocks into a plurality of adjacent frequency bands, said frequency bands extending from low frequency bands to high frequency bands;
applying a first quantization process upon said magnitude data blocks in each frequency band to develop quantized magnitude blocks;
applying a second quantization process upon said phase dam blocks in each frequency band to develop quantized phase blocks, said second quantization process developing higher precision quantization in said low frequency bands than in said high frequency bands;
recording said quantized magnitude blocks and said quantized phase blocks as digital data on the data storage medium wherein said first quantization process includes two-stage vector quantization of said magnitude data blocks.
13. The method of claim 12 wherein said digital audio signal is divided into overlapping signal blocks.
14. The method of claim 12 wherein said digital audio signal is divided into non-overlapping signal blocks.
15. The method of claim 12 further comprising reproducing said quantized transform blocks from said recorded digital data;
inverse transforming and inverse scaling said quantized transform blocks into decoded signal blocks; and
recombining said decoded signal blocks into a reproduction of said wideband digital audio signal.
16. The method of claim 12 wherein said first quantization process includes vector quantizing said magnitude data blocks.
17. The method of claim 12 wherein said first quantization process includes tree-structured vector quantization of said magnitude data blocks.
18. The method of claim 12 wherein said second quantization process includes quantization where the quantizer is designed so that quantization error in any phase data term is inversely proportional to the frequency of said term.
19. The method of claim 12 wherein said second quantization process includes scalar quantization with level spacing chosen so that the error resulting from said quantization does not exceed a value inversely proportional to the frequency of said term.
20. The method of claim 12 wherein said second quantization process includes scalar quantization with pseudorandom dither added to the phases.
21. The method of claim 12 wherein said second quantization process includes the step of dynamically altering bit allocation based in the wideband digital audio signal.
22. A method of processing a wideband digital audio signal and for storing the processed signal in a data storage medium comprising the steps of:
dividing said wideband audio signal into signal blocks,
Fourier transforming each said block into transform blocks representative of the magnitude and phase of the audio signal as a function of frequency in its associated block,
grouping said transform blocks into a plurality of adjacent frequency bands, each frequency band having a predetermined magnitude quantizer factor and predetermined phase quantizer factors, said quantizer factors determining the degree of precision of a subsequent quantization,
quantizing the magnitudes and phases of each transform block in each frequency band in accordance with its respective quantizer factor to develop quantized magnitude blocks and quantized phase blocks, and
recording said quantized magnitude blocks and quantized phase blocks as digital data on the data storage medium.
23. The invention as defined in claim 22 wherein the precision of the phase quantizer factor increases from the higher frequency bands to the lower frequency bands.
24. The invention as defined in claim 22 and comprising the step of introducing a random dither to phase quantizing step at least one of said frequency bands.
25. A method of processing a wideband digital audio signal and for storing the processed signal on a data storage medium comprising:
dividing said wideband digital audio signal into signal blocks;
dividing said signal blocks into a plurality of subblocks;
detecting transients in said subblocks and setting a transient flag associated with said signal block to a predetermined value if a transient greater than a predetermined threshold is detected;
if the transient flag is set to said predetermined value, scaling each subblock in accordance with transients detected to produce processed signal blocks and generating a scale factor for each subblock;
said scaling step further comprising the step of scaling at least one subblock occurring a predetermined time after a detected transient differently than scaling the subblock containing the transient;
transforming said processed signal blocks into transform blocks each comprising a plurality of transform values representative of the audio signal in its associated block;
quantizing said transform blocks; and
recording said quantized transform blocks, transient flags and scale factors as digital data on the data storage medium.
26. A method of processing a wideband digital audio signal and for storing the processed signal on a data storage medium comprising;
dividing said wideband digital audio signal into signal blocks;
detecting if a transient occurs in each signal block and setting a transient flag to a predetermined value when a transient is detected;
when said transient flag equals said predetermined value, dividing said signal blocks into a plurality of subblocks;
scaling each subblock in accordance with transients detected to produce processed signal blocks and generating a scale factor for each subblock;
said scaling step further comprising the step of scaling at least one subblock occurring a predetermined time after a detected transient differently than scaling the subblock containing the transient;
transforming said processed signal blocks into transform blocks each comprising a plurality of transform values representative of the magnitude and phase of the audio signal as a function of frequency in its associated block;
quantizing said transform blocks; and
recording said quantized transform blocks, transient flags and scale factors as digital data on a data storage medium.
27. A method of processing a wideband digital audio signal and for storing the processed signal on a data storage medium comprising:
dividing said wideband digital audio signal into signal blocks;
Fourier transforming said signal blocks into transformed blocks representative of the magnitude and phase of the audio signal in its associated block as a function of frequency;
extracting from said transformed blocks magnitude data blocks and phase data blocks as a function of frequency;
grouping said magnitude data blocks and phase data blocks into a plurality of adjacent frequency bands, said frequency bands extending from low frequency bands to high frequency bands;
applying a first quantization process upon said magnitude data blocks in each frequency band to develop quantized magnitude blocks;
applying a second quantization process upon said phase data blocks in each frequency band to develop quantized phase blocks, said second quantization process developing higher precision quantization in said low frequency bands than in said high frequency bands;
recording said quantized magnitude blocks and said quantized phase blocks as digital data on the data storage medium wherein said first quantization process includes tree-structured vector quantization of said magnitude data blocks.
28. The method of claim 27 further comprising reproducing said quantized transform blocks from said recorded digital data;
inverse transforming and inverse scaling said quantized transform blocks into decoded signal blocks; and
recombining said decoded signal blocks into a reproduction of said wideband digital audio signal.
29. A method of processing a wideband digital audio signal and for storing the processed signal on a data storage medium comprising:
dividing said wideband digital audio signal into signal blocks;
Fourier transforming said signal blocks representative of the magnitude and phase of the audio signal in its associated block as a function of frequency;
extracting from said transformed blocks magnitude data blocks and phase data blocks as a function of frequency;
grouping said magnitude data blocks and phase data blocks into a plurality of adjacent frequency bands, said frequency bands extending from low frequency bands to high frequency bands;
applying a first quantization process upon said magnitude data blocks in each frequency band to develop quantized magnitude blocks;
applying a second quantization process upon said phase data blocks in each frequency band to develop quantized phase blocks, said second quantization process developing higher precision quantization in said low frequency bands than in said high frequency bands;
recording said quantized magnitude blocks and said quantized phase blocks as digital data on the data storage medium wherein each phase data block comprises a plurality of phase coefficients and wherein said second quantization process comprises the step of applying a scalar quantizer to each phase coefficient with a level spacing inversely proportional to the frequency of each coefficient.
30. A method of processing a wideband digital audio signal and for storing the processed signal on a data storage medium comprising:
dividing said wideband digital audio signal into signal blocks;
Fourier transforming said signal blocks into transformed block representative of the magnitude and phase of the audio signal in its associated block as a function of frequency;
extracting from said transformed blocks magnitude data blocks and phase data blocks as a function of frequency;
grouping said magnitude data blocks and phase data blocks into a plurality of adjacent frequency bands, said frequency banks extending from low frequency bands to high frequency bands;
applying a first quantization process upon said magnitude data blocks in each frequency band to develop quantized magnitude blocks;
applying a second quantization process upon said phase data blocks in each frequency band to develop quantized phase blocks, said second quantization process developing higher precision quantization in said low frequency bands than in said high frequency bands;
recording said quantized magnitude blocks and said quantized phase blocks as digital data on the data storage medium;
detecting a transient in said transformed blocks and, when detected, decreasing the level spacing with respect to said second quantization process.
31. The method of claim 30 further comprising reproducing said quantized transform blocks from said recorded digital data;
inverse transforming and inverse scaling said quantized transform blocks into decoded signal blocks; and
recombining said decoded signal blocks into a reproduction of said wideband digital audio signal.
32. A method of processing a wideband digital audio signal and for storing the processed signal on a data storage medium comprising;
dividing said wideband digital audio signal into signal blocks;
Fourier transforming said signal blocks into transformed blocks representative of the magnitude and phase of the audio signal in its associated block as a function of frequency;
extracting from said transformed blocks magnitude data blocks and phase data blocks as a function of frequency;
grouping said magnitude data blocks and phase data blocks into a plurality of adjacent frequency bands, said frequency bands extending from low frequency bands to high frequency bands;
scaling each said frequency band by a constant value selected such that the energy of the frequency band equals a predetermined value within a preset range;
applying a first quantization process upon said magnitude data blocks in each frequency band to develop quantized magnitude blocks;
applying a second quantization process upon said phase data blocks in each frequency band to develop quantized phase blocks, said second quantization process developing higher precision quantization in said low frequency bands than in said high frequency bands;
recording said quantized magnitude blocks said constant values and said quantized phase blocks as digital data on the data storage medium wherein said first quantization process includes two-stage vector quantization of said magnitude data blocks.
33. The method of claim 32 further comprising reproducing said quantized transform blocks from said recorded digital data;
inverse transforming and inverse scaling said quantized transform blocks into decoded signal blocks; and
recombining said decoded signal blocks into a reproduction of said wideband digital audio signal.
US08/128,322 1990-05-29 1993-09-29 Digital audio compression system Expired - Lifetime US5388181A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US08/128,322 US5388181A (en) 1990-05-29 1993-09-29 Digital audio compression system

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US53054790A 1990-05-29 1990-05-29
US58271590A 1990-09-13 1990-09-13
US08/128,322 US5388181A (en) 1990-05-29 1993-09-29 Digital audio compression system

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US58271590A Continuation 1990-05-29 1990-09-13

Publications (1)

Publication Number Publication Date
US5388181A true US5388181A (en) 1995-02-07

Family

ID=27063294

Family Applications (1)

Application Number Title Priority Date Filing Date
US08/128,322 Expired - Lifetime US5388181A (en) 1990-05-29 1993-09-29 Digital audio compression system

Country Status (1)

Country Link
US (1) US5388181A (en)

Cited By (87)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1996030894A1 (en) * 1995-03-27 1996-10-03 Dolby Laboratories Licensing Corporation Method and apparatus for efficient implementation of single-sideband filter banks providing accurate measures of spectral magnitude and phase
US5581654A (en) * 1993-05-25 1996-12-03 Sony Corporation Method and apparatus for information encoding and decoding
US5590282A (en) * 1994-07-11 1996-12-31 Clynes; Manfred Remote access server using files containing generic and specific music data for generating customized music on demand
US5602961A (en) * 1994-05-31 1997-02-11 Alaris, Inc. Method and apparatus for speech compression using multi-mode code excited linear predictive coding
US5659659A (en) * 1993-07-26 1997-08-19 Alaris, Inc. Speech compressor using trellis encoding and linear prediction
US5742892A (en) * 1995-04-18 1998-04-21 Sun Microsystems, Inc. Decoder for a software-implemented end-to-end scalable video delivery system
US5819215A (en) * 1995-10-13 1998-10-06 Dobson; Kurt Method and apparatus for wavelet based data compression having adaptive bit rate control for compression of digital audio or other sensory data
US5825320A (en) * 1996-03-19 1998-10-20 Sony Corporation Gain control method for audio encoding device
US5832443A (en) * 1997-02-25 1998-11-03 Alaris, Inc. Method and apparatus for adaptive audio compression and decompression
US5841993A (en) * 1996-01-02 1998-11-24 Ho; Lawrence Surround sound system for personal computer for interfacing surround sound with personal computer
WO1998056184A1 (en) * 1997-06-05 1998-12-10 Wisconsin Alumni Research Foundation Image compression system using block transforms and tree-type coefficient truncation
US5889891A (en) * 1995-11-21 1999-03-30 Regents Of The University Of California Universal codebook vector quantization with constrained storage
US5956674A (en) * 1995-12-01 1999-09-21 Digital Theater Systems, Inc. Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels
US5960390A (en) * 1995-10-05 1999-09-28 Sony Corporation Coding method for using multi channel audio signals
US5974379A (en) * 1995-02-27 1999-10-26 Sony Corporation Methods and apparatus for gain controlling waveform elements ahead of an attack portion and waveform elements of a release portion
US5974387A (en) * 1996-06-19 1999-10-26 Yamaha Corporation Audio recompression from higher rates for karaoke, video games, and other applications
US5987407A (en) * 1997-10-28 1999-11-16 America Online, Inc. Soft-clipping postprocessor scaling decoded audio signal frame saturation regions to approximate original waveform shape and maintain continuity
WO2000016485A1 (en) * 1998-09-15 2000-03-23 Motorola Limited Speech coder for a communications system and method for operation thereof
EP1037390A1 (en) * 1999-03-17 2000-09-20 Matra Nortel Communications Method for coding, decoding and transcoding an audio signal
FR2793589A1 (en) * 1999-05-15 2000-11-17 Samsung Electronics Co Ltd Acoustic speech signal phase information processing having frequency periodic signals critical bandwidth calculator/local phase frequency plane change/phase significant discriminator producing digital phase outputs.
WO2001011610A1 (en) * 1999-08-06 2001-02-15 Motorola Inc. Factorial packing method and apparatus for information coding
EP1102242A1 (en) * 1999-11-22 2001-05-23 Alcatel Method for personalising speech output
US20010023403A1 (en) * 1990-06-15 2001-09-20 Martin John R. Computer jukebox and jukebox network
FR2809221A1 (en) * 2000-05-16 2001-11-23 Samsung Electronics Co Ltd DEVICE FOR QUANTIFYING THE PHASE OF A VOICE SIGNAL USING A PERCEPTION WEIGHTING FUNCTION, AND METHOD THEREOF
US6369722B1 (en) 2000-03-17 2002-04-09 Matra Nortel Communications Coding, decoding and transcoding methods
US6381575B1 (en) 1992-03-06 2002-04-30 Arachnid, Inc. Computer jukebox and computer jukebox management system
US6397189B1 (en) 1990-06-15 2002-05-28 Arachnid, Inc. Computer jukebox and jukebox network
US20020116199A1 (en) * 1999-05-27 2002-08-22 America Online, Inc. A Delaware Corporation Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US6531971B2 (en) 2000-05-15 2003-03-11 Achim Kempf Method for monitoring information density and compressing digitized signals
GB2380640A (en) * 2001-08-21 2003-04-09 Micron Technology Inc Data compression method
US20030074219A1 (en) * 1990-06-15 2003-04-17 Martin John R. System for managing a plurality of computer jukeboxes
US6567781B1 (en) 1999-12-30 2003-05-20 Quikcat.Com, Inc. Method and apparatus for compressing audio data using a dynamical system having a multi-state dynamical rule set and associated transform basis function
US20030115041A1 (en) * 2001-12-14 2003-06-19 Microsoft Corporation Quality improvement techniques in an audio encoder
US6678649B2 (en) * 1999-07-19 2004-01-13 Qualcomm Inc Method and apparatus for subsampling phase spectrum information
US20040044527A1 (en) * 2002-09-04 2004-03-04 Microsoft Corporation Quantization and inverse quantization for audio
US20040049379A1 (en) * 2002-09-04 2004-03-11 Microsoft Corporation Multi-channel audio encoding and decoding
US20040078194A1 (en) * 1997-06-10 2004-04-22 Coding Technologies Sweden Ab Source coding enhancement using spectral-band replication
US6754618B1 (en) * 2000-06-07 2004-06-22 Cirrus Logic, Inc. Fast implementation of MPEG audio coding
GB2396538A (en) * 2000-05-16 2004-06-23 Samsung Electronics Co Ltd An apparatus and method for quantizing the phase of speech signal using perceptual weighting function
US20050149324A1 (en) * 2001-12-14 2005-07-07 Microsoft Corporation Quantization matrices for digital audio
US20050177360A1 (en) * 2002-07-16 2005-08-11 Koninklijke Philips Electronics N.V. Audio coding
US20060020453A1 (en) * 2004-05-13 2006-01-26 Samsung Electronics Co., Ltd. Speech signal compression and/or decompression method, medium, and apparatus
US20060074750A1 (en) * 2004-10-01 2006-04-06 E-Cast, Inc. Prioritized content download for an entertainment device
US20060178870A1 (en) * 2003-03-17 2006-08-10 Koninklijke Philips Electronics N.V. Processing of multi-channel signals
US20060247928A1 (en) * 2005-04-28 2006-11-02 James Stuart Jeremy Cowdery Method and system for operating audio encoders in parallel
US20070016948A1 (en) * 2005-07-15 2007-01-18 Microsoft Corporation Immunizing HTML browsers and extensions from known vulnerabilities
US20070016412A1 (en) * 2005-07-15 2007-01-18 Microsoft Corporation Frequency segmentation to obtain bands for efficient coding of digital media
US20070016414A1 (en) * 2005-07-15 2007-01-18 Microsoft Corporation Modification of codewords in dictionary used for efficient coding of digital media spectral data
WO2007053120A1 (en) * 2005-11-04 2007-05-10 National University Of Singapore A device and a method of playing audio clips
US20070172071A1 (en) * 2006-01-20 2007-07-26 Microsoft Corporation Complex transforms for multi-channel audio
US20070174063A1 (en) * 2006-01-20 2007-07-26 Microsoft Corporation Shape and scale parameters for extended-band frequency coding
US20070192104A1 (en) * 2006-02-16 2007-08-16 At&T Corp. A system and method for providing large vocabulary speech processing based on fixed-point arithmetic
KR100765747B1 (en) 2005-01-22 2007-10-15 삼성전자주식회사 Apparatus for scalable speech and audio coding using Tree Structured Vector Quantizer
US20070255808A1 (en) * 2006-04-27 2007-11-01 Rowe International Corporation System and methods for updating registration information for a computer jukebox
US20070253481A1 (en) * 2004-10-13 2007-11-01 Matsushita Electric Industrial Co., Ltd. Scalable Encoder, Scalable Decoder,and Scalable Encoding Method
US20070282991A1 (en) * 2006-06-01 2007-12-06 Rowe International Corporation Remote song selection
WO2008064577A1 (en) * 2006-12-01 2008-06-05 Huawei Technologies Co., Ltd. A method and an apparatus for adjusting quantization quality in encoder and decoder
US20080228517A1 (en) * 1992-03-06 2008-09-18 Martin John R Computer jukebox and jukebox network
CN100435485C (en) * 2002-08-21 2008-11-19 广州广晟数码技术有限公司 Decoder for decoding and re-establishing multiple audio track andio signal from audio data code stream
WO2008138276A1 (en) * 2007-05-16 2008-11-20 Spreadtrum Communications (Shanghai) Co., Ltd. An audio frequency encoding and decoding method and device
US20080319739A1 (en) * 2007-06-22 2008-12-25 Microsoft Corporation Low complexity decoder for complex transform coding of multi-channel sound
US20090006103A1 (en) * 2007-06-29 2009-01-01 Microsoft Corporation Bitstream syntax for multi-process audio decoding
US20090083046A1 (en) * 2004-01-23 2009-03-26 Microsoft Corporation Efficient coding of digital media spectral data using wide-sense perceptual similarity
US20090112606A1 (en) * 2007-10-26 2009-04-30 Microsoft Corporation Channel extension coding for multi-channel source
US7539612B2 (en) 2005-07-15 2009-05-26 Microsoft Corporation Coding and decoding scale factor information
US20090198357A1 (en) * 1996-10-02 2009-08-06 James D. Logan And Kerry M. Logan Family Trust Portable audio player
US20090299753A1 (en) * 2008-05-30 2009-12-03 Yuli You Audio Signal Transient Detection
US7657910B1 (en) * 1999-07-26 2010-02-02 E-Cast Inc. Distributed electronic entertainment method and apparatus
CN1783726B (en) * 2002-08-21 2010-05-12 广州广晟数码技术有限公司 Decoder for decoding and reestablishing multi-channel audio signal from audio data code stream
US7761290B2 (en) 2007-06-15 2010-07-20 Microsoft Corporation Flexible frequency and time partitioning in perceptual transform coding of audio
US20100198603A1 (en) * 2009-01-30 2010-08-05 QNX SOFTWARE SYSTEMS(WAVEMAKERS), Inc. Sub-band processing complexity reduction
US20100228552A1 (en) * 2009-03-05 2010-09-09 Fujitsu Limited Audio decoding apparatus and audio decoding method
US7831434B2 (en) 2006-01-20 2010-11-09 Microsoft Corporation Complex-transform channel coding with extended-band frequency coding
US20100318368A1 (en) * 2002-09-04 2010-12-16 Microsoft Corporation Quantization and inverse quantization for audio
US20110066440A1 (en) * 2009-09-11 2011-03-17 Sling Media Pvt Ltd Audio signal encoding employing interchannel and temporal redundancy reduction
CN102270454A (en) * 2010-06-07 2011-12-07 宇达电脑(上海)有限公司 method and device for improving audio output
US20120029925A1 (en) * 2010-07-30 2012-02-02 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for dynamic bit allocation
CN101800051B (en) * 2009-02-09 2012-07-04 美国博通公司 Method for processing signal and system for processing audio signal
US20120323582A1 (en) * 2010-04-13 2012-12-20 Ke Peng Hierarchical Audio Frequency Encoding and Decoding Method and System, Hierarchical Frequency Encoding and Decoding Method for Transient Signal
US20130013322A1 (en) * 2010-01-12 2013-01-10 Guillaume Fuchs Audio encoder, audio decoder, method for encoding and decoding an audio information, and computer program obtaining a context sub-region value on the basis of a norm of previously decoded spectral values
US8612240B2 (en) 2009-10-20 2013-12-17 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using a region-dependent arithmetic coding mapping rule
US20150340045A1 (en) * 2014-05-01 2015-11-26 Digital Voice Systems, Inc. Audio Watermarking via Phase Modification
US9208792B2 (en) 2010-08-17 2015-12-08 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for noise injection
WO2016115483A3 (en) * 2015-01-15 2016-09-09 Hardwick John C Audio watermarking via phase modification
US9639709B2 (en) 2004-09-30 2017-05-02 Ami Entertainment Network, Llc Prioritized content download for an entertainment system
US9940942B2 (en) 2013-04-05 2018-04-10 Dolby International Ab Advanced quantizer
US11244692B2 (en) 2018-10-04 2022-02-08 Digital Voice Systems, Inc. Audio watermarking via correlation modification using an amplitude and a magnitude modification based on watermark data and to reduce distortion

Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3982070A (en) * 1974-06-05 1976-09-21 Bell Telephone Laboratories, Incorporated Phase vocoder speech synthesis system
US4232295A (en) * 1979-04-13 1980-11-04 Data Information Systems Corporation Jukebox polling system
US4287568A (en) * 1977-05-31 1981-09-01 Lester Robert W Solid state music player using signals from a bubble-memory storage device
US4300040A (en) * 1979-11-13 1981-11-10 Video Corporation Of America Ordering terminal
US4330689A (en) * 1980-01-28 1982-05-18 The United States Of America As Represented By The Secretary Of The Navy Multirate digital voice communication processor
GB2092354A (en) * 1981-02-02 1982-08-11 Stern Electronics Inc Vending machine
GB2103000A (en) * 1981-06-01 1983-02-09 Newtek Electronic Products Lim Interrogation of coin operated equipment
GB2141907A (en) * 1983-06-02 1985-01-03 Michael Gilmore Video games with advertising facility
US4528643A (en) * 1983-01-10 1985-07-09 Fpdc, Inc. System for reproducing information in material objects at a point of sale location
US4559602A (en) * 1983-01-27 1985-12-17 Bates Jr John K Signal processing and synthesizing method and apparatus
US4667340A (en) * 1983-04-13 1987-05-19 Texas Instruments Incorporated Voice messaging system with pitch-congruent baseband coding
US4790016A (en) * 1985-11-14 1988-12-06 Gte Laboratories Incorporated Adaptive method and apparatus for coding speech
US4896362A (en) * 1987-04-27 1990-01-23 U.S. Philips Corporation System for subband coding of a digital audio signal
US4916742A (en) * 1986-04-24 1990-04-10 Kolesnikov Viktor M Method of recording and reading audio information signals in digital form, and apparatus for performing same
US4922537A (en) * 1987-06-02 1990-05-01 Frederiksen & Shu Laboratories, Inc. Method and apparatus employing audio frequency offset extraction and floating-point conversion for digitally encoding and decoding high-fidelity audio signals
US4935963A (en) * 1986-01-24 1990-06-19 Racal Data Communications Inc. Method and apparatus for processing speech signals
US4949383A (en) * 1984-08-24 1990-08-14 Bristish Telecommunications Public Limited Company Frequency domain speech coding
US4953214A (en) * 1987-07-21 1990-08-28 Matushita Electric Industrial Co., Ltd. Signal encoding and decoding method and device
US4963030A (en) * 1989-11-29 1990-10-16 California Institute Of Technology Distributed-block vector quantization coder
US4965830A (en) * 1989-01-17 1990-10-23 Unisys Corp. Apparatus for estimating distortion resulting from compressing digital data
US5021971A (en) * 1989-12-07 1991-06-04 Unisys Corporation Reflective binary encoder for vector quantization
US5027376A (en) * 1985-10-30 1991-06-25 Microcom Systems, Inc. Data telecommunications system and method for transmitting compressed data
US5040217A (en) * 1989-10-18 1991-08-13 At&T Bell Laboratories Perceptual coding of audio signals
US5067152A (en) * 1989-01-30 1991-11-19 Information Technologies Research, Inc. Method and apparatus for vector quantization
US5068899A (en) * 1985-04-03 1991-11-26 Northern Telecom Limited Transmission of wideband speech signals
US5086475A (en) * 1988-11-19 1992-02-04 Sony Corporation Apparatus for generating, recording or reproducing sound source data
US5091944A (en) * 1989-04-21 1992-02-25 Mitsubishi Denki Kabushiki Kaisha Apparatus for linear predictive coding and decoding of speech using residual wave form time-access compression
US5115240A (en) * 1989-09-26 1992-05-19 Sony Corporation Method and apparatus for encoding voice signals divided into a plurality of frequency bands
US5197087A (en) * 1989-07-19 1993-03-23 Naoto Iwahashi Signal encoding apparatus
US5222189A (en) * 1989-01-27 1993-06-22 Dolby Laboratories Licensing Corporation Low time-delay transform coder, decoder, and encoder/decoder for high-quality audio

Patent Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3982070A (en) * 1974-06-05 1976-09-21 Bell Telephone Laboratories, Incorporated Phase vocoder speech synthesis system
US4287568A (en) * 1977-05-31 1981-09-01 Lester Robert W Solid state music player using signals from a bubble-memory storage device
US4232295A (en) * 1979-04-13 1980-11-04 Data Information Systems Corporation Jukebox polling system
US4300040A (en) * 1979-11-13 1981-11-10 Video Corporation Of America Ordering terminal
US4330689A (en) * 1980-01-28 1982-05-18 The United States Of America As Represented By The Secretary Of The Navy Multirate digital voice communication processor
GB2092354A (en) * 1981-02-02 1982-08-11 Stern Electronics Inc Vending machine
GB2103000A (en) * 1981-06-01 1983-02-09 Newtek Electronic Products Lim Interrogation of coin operated equipment
US4528643A (en) * 1983-01-10 1985-07-09 Fpdc, Inc. System for reproducing information in material objects at a point of sale location
US4559602A (en) * 1983-01-27 1985-12-17 Bates Jr John K Signal processing and synthesizing method and apparatus
US4667340A (en) * 1983-04-13 1987-05-19 Texas Instruments Incorporated Voice messaging system with pitch-congruent baseband coding
GB2141907A (en) * 1983-06-02 1985-01-03 Michael Gilmore Video games with advertising facility
US4949383A (en) * 1984-08-24 1990-08-14 Bristish Telecommunications Public Limited Company Frequency domain speech coding
US5068899A (en) * 1985-04-03 1991-11-26 Northern Telecom Limited Transmission of wideband speech signals
US5027376A (en) * 1985-10-30 1991-06-25 Microcom Systems, Inc. Data telecommunications system and method for transmitting compressed data
US4790016A (en) * 1985-11-14 1988-12-06 Gte Laboratories Incorporated Adaptive method and apparatus for coding speech
US4935963A (en) * 1986-01-24 1990-06-19 Racal Data Communications Inc. Method and apparatus for processing speech signals
US4916742A (en) * 1986-04-24 1990-04-10 Kolesnikov Viktor M Method of recording and reading audio information signals in digital form, and apparatus for performing same
US4896362A (en) * 1987-04-27 1990-01-23 U.S. Philips Corporation System for subband coding of a digital audio signal
US4922537A (en) * 1987-06-02 1990-05-01 Frederiksen & Shu Laboratories, Inc. Method and apparatus employing audio frequency offset extraction and floating-point conversion for digitally encoding and decoding high-fidelity audio signals
US4953214A (en) * 1987-07-21 1990-08-28 Matushita Electric Industrial Co., Ltd. Signal encoding and decoding method and device
US5086475A (en) * 1988-11-19 1992-02-04 Sony Corporation Apparatus for generating, recording or reproducing sound source data
US4965830A (en) * 1989-01-17 1990-10-23 Unisys Corp. Apparatus for estimating distortion resulting from compressing digital data
US5222189A (en) * 1989-01-27 1993-06-22 Dolby Laboratories Licensing Corporation Low time-delay transform coder, decoder, and encoder/decoder for high-quality audio
US5067152A (en) * 1989-01-30 1991-11-19 Information Technologies Research, Inc. Method and apparatus for vector quantization
US5091944A (en) * 1989-04-21 1992-02-25 Mitsubishi Denki Kabushiki Kaisha Apparatus for linear predictive coding and decoding of speech using residual wave form time-access compression
US5197087A (en) * 1989-07-19 1993-03-23 Naoto Iwahashi Signal encoding apparatus
US5115240A (en) * 1989-09-26 1992-05-19 Sony Corporation Method and apparatus for encoding voice signals divided into a plurality of frequency bands
US5040217A (en) * 1989-10-18 1991-08-13 At&T Bell Laboratories Perceptual coding of audio signals
US4963030A (en) * 1989-11-29 1990-10-16 California Institute Of Technology Distributed-block vector quantization coder
US5021971A (en) * 1989-12-07 1991-06-04 Unisys Corporation Reflective binary encoder for vector quantization

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Gray, Robert M., "Vector Quantization," IEEE ASSP Magazine, Apr. 1984, pp. 4-29.
Gray, Robert M., Vector Quantization, IEEE ASSP Magazine, Apr. 1984, pp. 4 29. *
Jayant, N. S., "High-Quality Coding of Telephone Speech and Wideband Audio," IEEE Communications Magazine, Jan. 1990, pp. 10-20.
Jayant, N. S., High Quality Coding of Telephone Speech and Wideband Audio, IEEE Communications Magazine, Jan. 1990, pp. 10 20. *

Cited By (208)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6397189B1 (en) 1990-06-15 2002-05-28 Arachnid, Inc. Computer jukebox and jukebox network
US20050216348A1 (en) * 1990-06-15 2005-09-29 Martin John R System for managing a plurality of computer jukeboxes
US20010023403A1 (en) * 1990-06-15 2001-09-20 Martin John R. Computer jukebox and jukebox network
US6970834B2 (en) 1990-06-15 2005-11-29 Arachnid, Inc. Advertisement downloading computer jukebox
US20030074219A1 (en) * 1990-06-15 2003-04-17 Martin John R. System for managing a plurality of computer jukeboxes
US6381575B1 (en) 1992-03-06 2002-04-30 Arachnid, Inc. Computer jukebox and computer jukebox management system
US20080228517A1 (en) * 1992-03-06 2008-09-18 Martin John R Computer jukebox and jukebox network
US5581654A (en) * 1993-05-25 1996-12-03 Sony Corporation Method and apparatus for information encoding and decoding
US5659659A (en) * 1993-07-26 1997-08-19 Alaris, Inc. Speech compressor using trellis encoding and linear prediction
US5602961A (en) * 1994-05-31 1997-02-11 Alaris, Inc. Method and apparatus for speech compression using multi-mode code excited linear predictive coding
US5729655A (en) * 1994-05-31 1998-03-17 Alaris, Inc. Method and apparatus for speech compression using multi-mode code excited linear predictive coding
US5590282A (en) * 1994-07-11 1996-12-31 Clynes; Manfred Remote access server using files containing generic and specific music data for generating customized music on demand
US5974379A (en) * 1995-02-27 1999-10-26 Sony Corporation Methods and apparatus for gain controlling waveform elements ahead of an attack portion and waveform elements of a release portion
WO1996030894A1 (en) * 1995-03-27 1996-10-03 Dolby Laboratories Licensing Corporation Method and apparatus for efficient implementation of single-sideband filter banks providing accurate measures of spectral magnitude and phase
US5742892A (en) * 1995-04-18 1998-04-21 Sun Microsystems, Inc. Decoder for a software-implemented end-to-end scalable video delivery system
US6266817B1 (en) 1995-04-18 2001-07-24 Sun Microsystems, Inc. Decoder for a software-implemented end-to-end scalable video delivery system
US5960390A (en) * 1995-10-05 1999-09-28 Sony Corporation Coding method for using multi channel audio signals
US5845243A (en) * 1995-10-13 1998-12-01 U.S. Robotics Mobile Communications Corp. Method and apparatus for wavelet based data compression having adaptive bit rate control for compression of audio information
US5819215A (en) * 1995-10-13 1998-10-06 Dobson; Kurt Method and apparatus for wavelet based data compression having adaptive bit rate control for compression of digital audio or other sensory data
US5889891A (en) * 1995-11-21 1999-03-30 Regents Of The University Of California Universal codebook vector quantization with constrained storage
US5978762A (en) * 1995-12-01 1999-11-02 Digital Theater Systems, Inc. Digitally encoded machine readable storage media using adaptive bit allocation in frequency, time and over multiple channels
US5956674A (en) * 1995-12-01 1999-09-21 Digital Theater Systems, Inc. Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels
US5974380A (en) * 1995-12-01 1999-10-26 Digital Theater Systems, Inc. Multi-channel audio decoder
US5841993A (en) * 1996-01-02 1998-11-24 Ho; Lawrence Surround sound system for personal computer for interfacing surround sound with personal computer
US5825320A (en) * 1996-03-19 1998-10-20 Sony Corporation Gain control method for audio encoding device
US5974387A (en) * 1996-06-19 1999-10-26 Yamaha Corporation Audio recompression from higher rates for karaoke, video games, and other applications
US20090198357A1 (en) * 1996-10-02 2009-08-06 James D. Logan And Kerry M. Logan Family Trust Portable audio player
US5832443A (en) * 1997-02-25 1998-11-03 Alaris, Inc. Method and apparatus for adaptive audio compression and decompression
WO1998056184A1 (en) * 1997-06-05 1998-12-10 Wisconsin Alumni Research Foundation Image compression system using block transforms and tree-type coefficient truncation
US6101279A (en) * 1997-06-05 2000-08-08 Wisconsin Alumni Research Foundation Image compression system using block transforms and tree-type coefficient truncation
US20040078194A1 (en) * 1997-06-10 2004-04-22 Coding Technologies Sweden Ab Source coding enhancement using spectral-band replication
US7328162B2 (en) * 1997-06-10 2008-02-05 Coding Technologies Ab Source coding enhancement using spectral-band replication
US5987407A (en) * 1997-10-28 1999-11-16 America Online, Inc. Soft-clipping postprocessor scaling decoded audio signal frame saturation regions to approximate original waveform shape and maintain continuity
US6006179A (en) * 1997-10-28 1999-12-21 America Online, Inc. Audio codec using adaptive sparse vector quantization with subband vector classification
WO2000016485A1 (en) * 1998-09-15 2000-03-23 Motorola Limited Speech coder for a communications system and method for operation thereof
FR2791166A1 (en) * 1999-03-17 2000-09-22 Matra Nortel Communications METHODS OF ENCODING, DECODING AND TRANSCODING
EP1037390A1 (en) * 1999-03-17 2000-09-20 Matra Nortel Communications Method for coding, decoding and transcoding an audio signal
GB2352598B (en) * 1999-05-15 2003-09-24 Samsung Electronics Co Ltd Device for processing phase information of acoustic signal and method thereof
US6571207B1 (en) 1999-05-15 2003-05-27 Samsung Electronics Co., Ltd. Device for processing phase information of acoustic signal and method thereof
GB2352598A (en) * 1999-05-15 2001-01-31 Samsung Electronics Co Ltd Processing phase information of acoustic signals
FR2793589A1 (en) * 1999-05-15 2000-11-17 Samsung Electronics Co Ltd Acoustic speech signal phase information processing having frequency periodic signals critical bandwidth calculator/local phase frequency plane change/phase significant discriminator producing digital phase outputs.
US20020116199A1 (en) * 1999-05-27 2002-08-22 America Online, Inc. A Delaware Corporation Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US6704706B2 (en) * 1999-05-27 2004-03-09 America Online, Inc. Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US8712785B2 (en) 1999-05-27 2014-04-29 Facebook, Inc. Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US8010371B2 (en) 1999-05-27 2011-08-30 Aol Inc. Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US20070083364A1 (en) * 1999-05-27 2007-04-12 Aol Llc Method and System for Reduction of Quantization-Induced Block-Discontinuities and General Purpose Audio Codec
US20090063164A1 (en) * 1999-05-27 2009-03-05 Aol Llc Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US6885993B2 (en) 1999-05-27 2005-04-26 America Online, Inc. Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US20050159940A1 (en) * 1999-05-27 2005-07-21 America Online, Inc., A Delaware Corporation Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US7181403B2 (en) 1999-05-27 2007-02-20 America Online, Inc. Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US8285558B2 (en) 1999-05-27 2012-10-09 Facebook, Inc. Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US7418395B2 (en) 1999-05-27 2008-08-26 Aol Llc Method and system for reduction of quantization-induced block-discontinuities and general purpose audio codec
US6678649B2 (en) * 1999-07-19 2004-01-13 Qualcomm Inc Method and apparatus for subsampling phase spectrum information
US7657910B1 (en) * 1999-07-26 2010-02-02 E-Cast Inc. Distributed electronic entertainment method and apparatus
US6236960B1 (en) * 1999-08-06 2001-05-22 Motorola, Inc. Factorial packing method and apparatus for information coding
WO2001011610A1 (en) * 1999-08-06 2001-02-15 Motorola Inc. Factorial packing method and apparatus for information coding
EP1102242A1 (en) * 1999-11-22 2001-05-23 Alcatel Method for personalising speech output
US6567781B1 (en) 1999-12-30 2003-05-20 Quikcat.Com, Inc. Method and apparatus for compressing audio data using a dynamical system having a multi-state dynamical rule set and associated transform basis function
US6369722B1 (en) 2000-03-17 2002-04-09 Matra Nortel Communications Coding, decoding and transcoding methods
US6531971B2 (en) 2000-05-15 2003-03-11 Achim Kempf Method for monitoring information density and compressing digitized signals
GB2396538B (en) * 2000-05-16 2004-11-03 Samsung Electronics Co Ltd An apparatus and method for quantizing phase of speech signal using perceptual weighting function
GB2396538A (en) * 2000-05-16 2004-06-23 Samsung Electronics Co Ltd An apparatus and method for quantizing the phase of speech signal using perceptual weighting function
US6577995B1 (en) 2000-05-16 2003-06-10 Samsung Electronics Co., Ltd. Apparatus for quantizing phase of speech signal using perceptual weighting function and method therefor
FR2809221A1 (en) * 2000-05-16 2001-11-23 Samsung Electronics Co Ltd DEVICE FOR QUANTIFYING THE PHASE OF A VOICE SIGNAL USING A PERCEPTION WEIGHTING FUNCTION, AND METHOD THEREOF
US6754618B1 (en) * 2000-06-07 2004-06-22 Cirrus Logic, Inc. Fast implementation of MPEG audio coding
GB2380640A (en) * 2001-08-21 2003-04-09 Micron Technology Inc Data compression method
US8428943B2 (en) 2001-12-14 2013-04-23 Microsoft Corporation Quantization matrices for digital audio
US9443525B2 (en) 2001-12-14 2016-09-13 Microsoft Technology Licensing, Llc Quality improvement techniques in an audio encoder
US7155383B2 (en) 2001-12-14 2006-12-26 Microsoft Corporation Quantization matrices for jointly coded channels of audio
US7917369B2 (en) 2001-12-14 2011-03-29 Microsoft Corporation Quality improvement techniques in an audio encoder
US20110166864A1 (en) * 2001-12-14 2011-07-07 Microsoft Corporation Quantization matrices for digital audio
US20030115041A1 (en) * 2001-12-14 2003-06-19 Microsoft Corporation Quality improvement techniques in an audio encoder
US7143030B2 (en) 2001-12-14 2006-11-28 Microsoft Corporation Parametric compression/decompression modes for quantization matrices for digital audio
US20090326962A1 (en) * 2001-12-14 2009-12-31 Microsoft Corporation Quality improvement techniques in an audio encoder
US20080015850A1 (en) * 2001-12-14 2008-01-17 Microsoft Corporation Quantization matrices for digital audio
US7240001B2 (en) 2001-12-14 2007-07-03 Microsoft Corporation Quality improvement techniques in an audio encoder
US7249016B2 (en) 2001-12-14 2007-07-24 Microsoft Corporation Quantization matrices using normalized-block pattern of digital audio
US6934677B2 (en) * 2001-12-14 2005-08-23 Microsoft Corporation Quantization matrices based on critical band pattern information for digital audio wherein quantization bands differ from critical bands
US9305558B2 (en) 2001-12-14 2016-04-05 Microsoft Technology Licensing, Llc Multi-channel audio encoding/decoding with parametric compression/decompression and weight factors
US20070185706A1 (en) * 2001-12-14 2007-08-09 Microsoft Corporation Quality improvement techniques in an audio encoder
US8805696B2 (en) * 2001-12-14 2014-08-12 Microsoft Corporation Quality improvement techniques in an audio encoder
US20050149324A1 (en) * 2001-12-14 2005-07-07 Microsoft Corporation Quantization matrices for digital audio
US20050149323A1 (en) * 2001-12-14 2005-07-07 Microsoft Corporation Quantization matrices for digital audio
US20050159947A1 (en) * 2001-12-14 2005-07-21 Microsoft Corporation Quantization matrices for digital audio
US7930171B2 (en) 2001-12-14 2011-04-19 Microsoft Corporation Multi-channel audio encoding/decoding with parametric compression/decompression and weight factors
US8554569B2 (en) * 2001-12-14 2013-10-08 Microsoft Corporation Quality improvement techniques in an audio encoder
US7542896B2 (en) * 2002-07-16 2009-06-02 Koninklijke Philips Electronics N.V. Audio coding/decoding with spatial parameters and non-uniform segmentation for transients
US20050177360A1 (en) * 2002-07-16 2005-08-11 Koninklijke Philips Electronics N.V. Audio coding
CN1783726B (en) * 2002-08-21 2010-05-12 广州广晟数码技术有限公司 Decoder for decoding and reestablishing multi-channel audio signal from audio data code stream
CN100435485C (en) * 2002-08-21 2008-11-19 广州广晟数码技术有限公司 Decoder for decoding and re-establishing multiple audio track andio signal from audio data code stream
CN100452657C (en) * 2002-08-21 2009-01-14 广州广晟数码技术有限公司 Coding method for compressing coding of multiple audio track audio signal
US20110054916A1 (en) * 2002-09-04 2011-03-03 Microsoft Corporation Multi-channel audio encoding and decoding
US20100318368A1 (en) * 2002-09-04 2010-12-16 Microsoft Corporation Quantization and inverse quantization for audio
US8255234B2 (en) 2002-09-04 2012-08-28 Microsoft Corporation Quantization and inverse quantization for audio
US7860720B2 (en) 2002-09-04 2010-12-28 Microsoft Corporation Multi-channel audio encoding and decoding with different window configurations
US7801735B2 (en) 2002-09-04 2010-09-21 Microsoft Corporation Compressing and decompressing weight factors using temporal prediction for audio data
US7299190B2 (en) 2002-09-04 2007-11-20 Microsoft Corporation Quantization and inverse quantization for audio
US8386269B2 (en) 2002-09-04 2013-02-26 Microsoft Corporation Multi-channel audio encoding and decoding
US8099292B2 (en) 2002-09-04 2012-01-17 Microsoft Corporation Multi-channel audio encoding and decoding
US8620674B2 (en) 2002-09-04 2013-12-31 Microsoft Corporation Multi-channel audio encoding and decoding
US8069050B2 (en) 2002-09-04 2011-11-29 Microsoft Corporation Multi-channel audio encoding and decoding
US7502743B2 (en) 2002-09-04 2009-03-10 Microsoft Corporation Multi-channel audio encoding and decoding with multi-channel transform selection
US20040044527A1 (en) * 2002-09-04 2004-03-04 Microsoft Corporation Quantization and inverse quantization for audio
US8069052B2 (en) 2002-09-04 2011-11-29 Microsoft Corporation Quantization and inverse quantization for audio
US20080021704A1 (en) * 2002-09-04 2008-01-24 Microsoft Corporation Quantization and inverse quantization for audio
US20110060597A1 (en) * 2002-09-04 2011-03-10 Microsoft Corporation Multi-channel audio encoding and decoding
US8255230B2 (en) 2002-09-04 2012-08-28 Microsoft Corporation Multi-channel audio encoding and decoding
US20040049379A1 (en) * 2002-09-04 2004-03-11 Microsoft Corporation Multi-channel audio encoding and decoding
US20060178870A1 (en) * 2003-03-17 2006-08-10 Koninklijke Philips Electronics N.V. Processing of multi-channel signals
US7343281B2 (en) 2003-03-17 2008-03-11 Koninklijke Philips Electronics N.V. Processing of multi-channel signals
US20090083046A1 (en) * 2004-01-23 2009-03-26 Microsoft Corporation Efficient coding of digital media spectral data using wide-sense perceptual similarity
US8645127B2 (en) 2004-01-23 2014-02-04 Microsoft Corporation Efficient coding of digital media spectral data using wide-sense perceptual similarity
US8019600B2 (en) * 2004-05-13 2011-09-13 Samsung Electronics Co., Ltd. Speech signal compression and/or decompression method, medium, and apparatus
US20060020453A1 (en) * 2004-05-13 2006-01-26 Samsung Electronics Co., Ltd. Speech signal compression and/or decompression method, medium, and apparatus
US9639709B2 (en) 2004-09-30 2017-05-02 Ami Entertainment Network, Llc Prioritized content download for an entertainment system
US20060074750A1 (en) * 2004-10-01 2006-04-06 E-Cast, Inc. Prioritized content download for an entertainment device
US8099482B2 (en) 2004-10-01 2012-01-17 E-Cast Inc. Prioritized content download for an entertainment device
US20070253481A1 (en) * 2004-10-13 2007-11-01 Matsushita Electric Industrial Co., Ltd. Scalable Encoder, Scalable Decoder,and Scalable Encoding Method
US8010349B2 (en) * 2004-10-13 2011-08-30 Panasonic Corporation Scalable encoder, scalable decoder, and scalable encoding method
KR100765747B1 (en) 2005-01-22 2007-10-15 삼성전자주식회사 Apparatus for scalable speech and audio coding using Tree Structured Vector Quantizer
US20060247928A1 (en) * 2005-04-28 2006-11-02 James Stuart Jeremy Cowdery Method and system for operating audio encoders in parallel
US7418394B2 (en) * 2005-04-28 2008-08-26 Dolby Laboratories Licensing Corporation Method and system for operating audio encoders utilizing data from overlapping audio segments
US7630882B2 (en) 2005-07-15 2009-12-08 Microsoft Corporation Frequency segmentation to obtain bands for efficient coding of digital media
US7562021B2 (en) 2005-07-15 2009-07-14 Microsoft Corporation Modification of codewords in dictionary used for efficient coding of digital media spectral data
US7539612B2 (en) 2005-07-15 2009-05-26 Microsoft Corporation Coding and decoding scale factor information
US20070016414A1 (en) * 2005-07-15 2007-01-18 Microsoft Corporation Modification of codewords in dictionary used for efficient coding of digital media spectral data
US20070016412A1 (en) * 2005-07-15 2007-01-18 Microsoft Corporation Frequency segmentation to obtain bands for efficient coding of digital media
US20070016948A1 (en) * 2005-07-15 2007-01-18 Microsoft Corporation Immunizing HTML browsers and extensions from known vulnerabilities
JP2009515215A (en) * 2005-11-04 2009-04-09 ナショナル ユニバーシティ オブ シンガポール Audio clip playback device, playback method, and storage medium
US8036900B2 (en) 2005-11-04 2011-10-11 National University Of Singapore Device and a method of playing audio clips
US20080306744A1 (en) * 2005-11-04 2008-12-11 National University Of Singapore Device and a Method of Playing Audio Clips
CN101356741B (en) * 2005-11-04 2012-10-31 新加坡国立大学 A device and a method of playing audio clips
WO2007053120A1 (en) * 2005-11-04 2007-05-10 National University Of Singapore A device and a method of playing audio clips
US9105271B2 (en) 2006-01-20 2015-08-11 Microsoft Technology Licensing, Llc Complex-transform channel coding with extended-band frequency coding
US20070172071A1 (en) * 2006-01-20 2007-07-26 Microsoft Corporation Complex transforms for multi-channel audio
US7953604B2 (en) 2006-01-20 2011-05-31 Microsoft Corporation Shape and scale parameters for extended-band frequency coding
US20110035226A1 (en) * 2006-01-20 2011-02-10 Microsoft Corporation Complex-transform channel coding with extended-band frequency coding
US8190425B2 (en) 2006-01-20 2012-05-29 Microsoft Corporation Complex cross-correlation parameters for multi-channel audio
US7831434B2 (en) 2006-01-20 2010-11-09 Microsoft Corporation Complex-transform channel coding with extended-band frequency coding
US20070174063A1 (en) * 2006-01-20 2007-07-26 Microsoft Corporation Shape and scale parameters for extended-band frequency coding
US8195462B2 (en) * 2006-02-16 2012-06-05 At&T Intellectual Property Ii, L.P. System and method for providing large vocabulary speech processing based on fixed-point arithmetic
US20070192104A1 (en) * 2006-02-16 2007-08-16 At&T Corp. A system and method for providing large vocabulary speech processing based on fixed-point arithmetic
US20070255808A1 (en) * 2006-04-27 2007-11-01 Rowe International Corporation System and methods for updating registration information for a computer jukebox
US7856487B2 (en) 2006-04-27 2010-12-21 Ami Entertainment Network, Inc. System and methods for updating registration information for a computer jukebox
US20070282991A1 (en) * 2006-06-01 2007-12-06 Rowe International Corporation Remote song selection
WO2008064577A1 (en) * 2006-12-01 2008-06-05 Huawei Technologies Co., Ltd. A method and an apparatus for adjusting quantization quality in encoder and decoder
US8463614B2 (en) 2007-05-16 2013-06-11 Spreadtrum Communications (Shanghai) Co., Ltd. Audio encoding/decoding for reducing pre-echo of a transient as a function of bit rate
WO2008138276A1 (en) * 2007-05-16 2008-11-20 Spreadtrum Communications (Shanghai) Co., Ltd. An audio frequency encoding and decoding method and device
US7761290B2 (en) 2007-06-15 2010-07-20 Microsoft Corporation Flexible frequency and time partitioning in perceptual transform coding of audio
US8046214B2 (en) 2007-06-22 2011-10-25 Microsoft Corporation Low complexity decoder for complex transform coding of multi-channel sound
US20080319739A1 (en) * 2007-06-22 2008-12-25 Microsoft Corporation Low complexity decoder for complex transform coding of multi-channel sound
US7885819B2 (en) 2007-06-29 2011-02-08 Microsoft Corporation Bitstream syntax for multi-process audio decoding
US20090006103A1 (en) * 2007-06-29 2009-01-01 Microsoft Corporation Bitstream syntax for multi-process audio decoding
US9741354B2 (en) 2007-06-29 2017-08-22 Microsoft Technology Licensing, Llc Bitstream syntax for multi-process audio decoding
US8255229B2 (en) 2007-06-29 2012-08-28 Microsoft Corporation Bitstream syntax for multi-process audio decoding
US8645146B2 (en) 2007-06-29 2014-02-04 Microsoft Corporation Bitstream syntax for multi-process audio decoding
US9026452B2 (en) 2007-06-29 2015-05-05 Microsoft Technology Licensing, Llc Bitstream syntax for multi-process audio decoding
US9349376B2 (en) 2007-06-29 2016-05-24 Microsoft Technology Licensing, Llc Bitstream syntax for multi-process audio decoding
US20110196684A1 (en) * 2007-06-29 2011-08-11 Microsoft Corporation Bitstream syntax for multi-process audio decoding
US20090112606A1 (en) * 2007-10-26 2009-04-30 Microsoft Corporation Channel extension coding for multi-channel source
US8249883B2 (en) 2007-10-26 2012-08-21 Microsoft Corporation Channel extension coding for multi-channel source
US20090299753A1 (en) * 2008-05-30 2009-12-03 Yuli You Audio Signal Transient Detection
US8630848B2 (en) * 2008-05-30 2014-01-14 Digital Rise Technology Co., Ltd. Audio signal transient detection
US9536532B2 (en) 2008-05-30 2017-01-03 Digital Rise Technology Co., Ltd. Audio signal transient detection
US9361893B2 (en) 2008-05-30 2016-06-07 Digital Rise Technology Co., Ltd. Detection of an audio signal transient using first and second maximum norms
US8805679B2 (en) 2008-05-30 2014-08-12 Digital Rise Technology Co., Ltd. Audio signal transient detection
US8457976B2 (en) * 2009-01-30 2013-06-04 Qnx Software Systems Limited Sub-band processing complexity reduction
US20100198603A1 (en) * 2009-01-30 2010-08-05 QNX SOFTWARE SYSTEMS(WAVEMAKERS), Inc. Sub-band processing complexity reduction
US9225318B2 (en) 2009-01-30 2015-12-29 2236008 Ontario Inc. Sub-band processing complexity reduction
CN101800051B (en) * 2009-02-09 2012-07-04 美国博通公司 Method for processing signal and system for processing audio signal
US8706508B2 (en) * 2009-03-05 2014-04-22 Fujitsu Limited Audio decoding apparatus and audio decoding method performing weighted addition on signals
US20100228552A1 (en) * 2009-03-05 2010-09-09 Fujitsu Limited Audio decoding apparatus and audio decoding method
CN102483924A (en) * 2009-09-11 2012-05-30 斯灵媒体有限公司 Audio Signal Encoding Employing Interchannel And Temporal Redundancy Reduction
US20110066440A1 (en) * 2009-09-11 2011-03-17 Sling Media Pvt Ltd Audio signal encoding employing interchannel and temporal redundancy reduction
JP2013504781A (en) * 2009-09-11 2013-02-07 スリング メディア ピーブイティー エルティーディー. Speech signal coding using interchannel and temporal redundancy suppression.
US8498874B2 (en) 2009-09-11 2013-07-30 Sling Media Pvt Ltd Audio signal encoding employing interchannel and temporal redundancy reduction
CN102483924B (en) * 2009-09-11 2014-05-28 斯灵媒体有限公司 Audio Signal Encoding Employing Interchannel And Temporal Redundancy Reduction
WO2011030354A3 (en) * 2009-09-11 2011-05-05 Sling Media Pvt Ltd Audio signal encoding employing interchannel and temporal redundancy reduction
AU2010293792B2 (en) * 2009-09-11 2014-03-06 Dish Network Technologies India Private Limited Audio signal encoding employing interchannel and temporal redundancy reduction
KR101363206B1 (en) * 2009-09-11 2014-02-12 슬링 미디어 피브이티 엘티디 Audio signal encoding employing interchannel and temporal redundancy reduction
US9646615B2 (en) 2009-09-11 2017-05-09 Echostar Technologies L.L.C. Audio signal encoding employing interchannel and temporal redundancy reduction
US8706510B2 (en) 2009-10-20 2014-04-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using a detection of a group of previously-decoded spectral values
US8612240B2 (en) 2009-10-20 2013-12-17 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using a region-dependent arithmetic coding mapping rule
US8655669B2 (en) 2009-10-20 2014-02-18 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using an iterative interval size reduction
US9978380B2 (en) 2009-10-20 2018-05-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using a detection of a group of previously-decoded spectral values
US11443752B2 (en) 2009-10-20 2022-09-13 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, method for encoding an audio information, method for decoding an audio information and computer program using a detection of a group of previously-decoded spectral values
RU2605677C2 (en) * 2009-10-20 2016-12-27 Франхофер-Гезелльшафт цур Фёрдерунг дер ангевандтен Audio encoder, audio decoder, method of encoding audio information, method of decoding audio information and computer program using iterative reduction of size of interval
US8898068B2 (en) 2010-01-12 2014-11-25 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, method for encoding and audio information, method for decoding an audio information and computer program using a modification of a number representation of a numeric previous context value
US9633664B2 (en) 2010-01-12 2017-04-25 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, method for encoding and audio information, method for decoding an audio information and computer program using a modification of a number representation of a numeric previous context value
US8682681B2 (en) * 2010-01-12 2014-03-25 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, method for encoding and decoding an audio information, and computer program obtaining a context sub-region value on the basis of a norm of previously decoded spectral values
US8645145B2 (en) 2010-01-12 2014-02-04 Fraunhoffer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, method for encoding and audio information, method for decoding an audio information and computer program using a hash table describing both significant state values and interval boundaries
US20130013322A1 (en) * 2010-01-12 2013-01-10 Guillaume Fuchs Audio encoder, audio decoder, method for encoding and decoding an audio information, and computer program obtaining a context sub-region value on the basis of a norm of previously decoded spectral values
US20120323582A1 (en) * 2010-04-13 2012-12-20 Ke Peng Hierarchical Audio Frequency Encoding and Decoding Method and System, Hierarchical Frequency Encoding and Decoding Method for Transient Signal
US8874450B2 (en) * 2010-04-13 2014-10-28 Zte Corporation Hierarchical audio frequency encoding and decoding method and system, hierarchical frequency encoding and decoding method for transient signal
CN102270454A (en) * 2010-06-07 2011-12-07 宇达电脑(上海)有限公司 method and device for improving audio output
US8924222B2 (en) 2010-07-30 2014-12-30 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for coding of harmonic signals
US9236063B2 (en) * 2010-07-30 2016-01-12 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for dynamic bit allocation
US8831933B2 (en) 2010-07-30 2014-09-09 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for multi-stage shape vector quantization
US20120029925A1 (en) * 2010-07-30 2012-02-02 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for dynamic bit allocation
US9208792B2 (en) 2010-08-17 2015-12-08 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for noise injection
US10311884B2 (en) 2013-04-05 2019-06-04 Dolby International Ab Advanced quantizer
US9940942B2 (en) 2013-04-05 2018-04-10 Dolby International Ab Advanced quantizer
US10210875B2 (en) * 2014-05-01 2019-02-19 Digital Voice Systems, Inc. Audio watermarking via phase modification
US20180286417A1 (en) * 2014-05-01 2018-10-04 Digital Voice Systems, Inc. Audio watermarking via phase modification
US20150340045A1 (en) * 2014-05-01 2015-11-26 Digital Voice Systems, Inc. Audio Watermarking via Phase Modification
US9990928B2 (en) * 2014-05-01 2018-06-05 Digital Voice Systems, Inc. Audio watermarking via phase modification
WO2016115483A3 (en) * 2015-01-15 2016-09-09 Hardwick John C Audio watermarking via phase modification
US11244692B2 (en) 2018-10-04 2022-02-08 Digital Voice Systems, Inc. Audio watermarking via correlation modification using an amplitude and a magnitude modification based on watermark data and to reduce distortion

Similar Documents

Publication Publication Date Title
US5388181A (en) Digital audio compression system
US5341457A (en) Perceptual coding of audio signals
US5495552A (en) Methods of efficiently recording an audio signal in semiconductor memory
KR100903017B1 (en) Scalable coding method for high quality audio
US5083310A (en) Compression and expansion technique for digital audio data
USRE36714E (en) Perceptual coding of audio signals
JP3173218B2 (en) Compressed data recording method and apparatus, compressed data reproducing method, and recording medium
JP3336617B2 (en) Signal encoding or decoding apparatus, signal encoding or decoding method, and recording medium
US5687157A (en) Method of recording and reproducing digital audio signal and apparatus thereof
KR100310214B1 (en) Signal encoding or decoding device and recording medium
JP3186307B2 (en) Compressed data recording apparatus and method
US6240388B1 (en) Audio data decoding device and audio data coding/decoding system
US6011824A (en) Signal-reproduction method and apparatus
JPH08190764A (en) Method and device for processing digital signal and recording medium
JP2001142498A (en) Method and device for digital signal processing, method and device for digital signal recording, and recording medium
JP3531177B2 (en) Compressed data recording apparatus and method, compressed data reproducing method
US5483619A (en) Method and apparatus for editing an audio signal
KR950034205A (en) Digital Signal Processing Method and Recording Media
US20020169601A1 (en) Encoding device, decoding device, and broadcast system
EP0376553A2 (en) Perceptual coding of audio signals
JP3580444B2 (en) Signal transmission method and apparatus, and signal reproduction method
WO1994018762A1 (en) Transmission of digital data words representing a signal waveform
JPH0846516A (en) Device and method for information coding, device and method for information decoding and recording medium
JP2002314429A (en) Signal processor and signal processing method
CN1265354C (en) Audio processing method and audio processor

Legal Events

Date Code Title Description
STCF Information on status: patent grant

Free format text: PATENTED CASE

CC Certificate of correction
FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

AS Assignment

Owner name: MICHIGAN, UNIVERSITY OF, REGENTS OF THE, THE, MICH

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ANDERSON, DAVID J.;NEUHOFF, DAVID L.;REEL/FRAME:009267/0222

Effective date: 19980205

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

REMI Maintenance fee reminder mailed
FPAY Fee payment

Year of fee payment: 12