US9153240B2 - Transform coding of speech and audio signals - Google Patents
Transform coding of speech and audio signals Download PDFInfo
- Publication number
- US9153240B2 US9153240B2 US13/939,931 US201313939931A US9153240B2 US 9153240 B2 US9153240 B2 US 9153240B2 US 201313939931 A US201313939931 A US 201313939931A US 9153240 B2 US9153240 B2 US 9153240B2
- Authority
- US
- United States
- Prior art keywords
- bspe
- sub
- spectrum
- max
- transform
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
- G10L19/035—Scalar quantisation
Definitions
- the present invention generally relates to signal processing such as signal compression and audio coding, and more particularly to improved transform speech and audio coding and corresponding devices.
- An encoder is a device, circuitry, or computer program that is capable of analyzing a signal such as an audio signal and outputting a signal in an encoded form. The resulting signal is often used for transmission, storage, and/or encryption purposes.
- a decoder is a device, circuitry, or computer program that is capable of inverting the encoder operation, in that it receives the encoded signal and outputs a decoded signal.
- each frame of the input signal is analyzed and transformed from the time domain to the frequency domain.
- the result of this analysis is quantized and encoded and then transmitted or stored depending on the application.
- a corresponding decoding procedure followed by a synthesis procedure makes it possible to restore the signal in the time domain.
- Codecs are often employed for compression/decompression of information such as audio and video data for efficient transmission over bandwidth-limited communication channels.
- transform codecs are normally based around a time-to-frequency domain transform such as a DCT (Discrete Cosine Transform), a Modified Discrete Cosine Transform (MDCT) or some other lapped transform which allow a better coding efficiency relative to the hearing system properties.
- DCT Discrete Cosine Transform
- MDCT Modified Discrete Cosine Transform
- a common characteristic of transform codecs is that they operate on overlapped blocks of samples i.e. overlapped frames.
- the coding coefficients resulting from a transform analysis or an equivalent sub-band analysis of each frame are normally quantized and stored or transmitted to the receiving side as a bit-stream.
- the decoder upon reception of the bit-stream, performs de-quantization and inverse transformation in order to reconstruct the signal frames.
- perceptual encoders use a lossy coding model for the receiving destination i.e. the human auditory system, rather than a model of the source signal.
- Perceptual audio encoding thus entails the encoding of audio signals, incorporating psychoacoustical knowledge of the auditory system, in order to optimize/reduce the amount of bits necessary to reproduce faithfully the original audio signal.
- perceptual encoding attempts to remove i.e. not transmit or approximate parts of the signal that the human recipient would not perceive, i.e. lossy coding as opposed to lossless coding of the source signal.
- the model is typically referred to as the psychoacoustical model.
- perceptual coders will have a lower signal to noise ratio (SNR) than a waveform coder will, and a higher perceived quality than a lossless coder operating at equivalent bit rate.
- SNR signal to noise ratio
- a perceptual encoder uses a masking pattern of stimulus to determine the least number of bits necessary to encode i.e. quantize each frequency sub-band, without introducing audible quantization noise.
- Perceptual modeling has been extensively used in high bit rate audio coding.
- Standardized coders such as MPEG-1 Layer III [3], MPEG-2 Advanced Audio Coding [4], achieve “CD quality” at rates of 128 kbps and respectively 64 kbps for wideband audio. Nevertheless, these codecs are by definition forced to underestimate the amount of masking to ensure that distortion remains inaudible.
- wideband audio coders usually use a high complexity auditory (psychoacoustical) model, which is not very reliable at low bit rate (below 64 kbps).
- the present invention overcomes these and other drawbacks of the prior art arrangements.
- a method of perceptual transform coding of audio signals in a telecommunication system includes the following steps:(a) initially determining transform coefficients representative of a time to frequency transformation of a time segmented input audio signal, (b) determining a spectrum of perceptual sub-bands for the input audio signal based on the determined transform coefficients,(c) determining masking thresholds for each of the sub-bands based on said determined spectrum, (d) computing scale factors for each sub-band based on its respective determined masking thresholds, and (e) adapting the computed scale factors for each of the sub-bands to prevent energy loss due to coding for perceptually relevant sub-bands, i.e. in order to reach high quality low bit rate coding.
- FIG. 1 illustrates exemplary encoder suitable for full-band audio encoding
- FIG. 2 illustrates an exemplary decoder suitable for full-band audio decoding
- FIG. 3 illustrates a generic perceptual transform encoder
- FIG. 4 illustrates a generic perceptual transform decoder
- FIG. 5 illustrates a flow diagram of a method in a psychoacoustical model according to the present invention
- FIG. 6 illustrates a further flow diagram of an embodiment of a method according to the present invention.
- FIG. 7 illustrates another flow diagram of an embodiment of a method according to the present invention.
- FIG. 8 illustrates an arrangement according to some embodiments for performing methods disclosed herein.
- the present invention is mainly concerned with transform coding, and specifically with sub-band coding.
- Signal processing in telecommunication sometimes utilizes companding as a method of improving the signal representation with limited dynamic range.
- the term is a combination of compressing and expanding, thus indicating that the dynamic range of a signal is compressed before transmission and is expanded to the original value at the receiver. This allows signals with a large dynamic range to be transmitted over facilities that have a smaller dynamic range capability.
- the codec is presented as a low-complexity transform-based audio codec, which preferably operates at a sampling rate of 48 kHz and offers full audio bandwidth ranging from 20 Hz up to 20 kHz.
- the encoder processes input 16-bits linear PCM signals on frames of 20 ms and the codec has an overall delay of 40 ms.
- the coding algorithm is preferably based on transform coding with adaptive time-resolution, adaptive bit-allocation and low-complexity lattice vector quantization.
- the decoder may replace non-coded spectrum components by either signal adaptive noise-fill or bandwidth extension.
- FIG. 1 is a block diagram of an exemplary encoder suitable for full-band audio encoding.
- the input signal sampled at 48 kHz is processed through a transient detector.
- a high frequency resolution or a low frequency resolution (high time resolution) transform is applied on the input signal frame.
- the adaptive transform is preferably based on a Modified Discrete Cosine Transform (MDCT) in case of stationary frames.
- MDCT Modified Discrete Cosine Transform
- Non-stationary frames preferably have a temporal resolution equivalent to 5 ms frames (although any arbitrary resolution can be selected).
- the norm of each band may be estimated and the resulting spectral envelope consisting of the norms of all bands is quantized and encoded.
- the coefficients are then normalized by the quantized norms.
- the quantized norms are further adjusted based on adaptive spectral weighting and used as input for bit allocation.
- the normalized spectral coefficients are lattice vector quantized and encoded based on the allocated bits for each frequency band.
- the level of the non-coded spectral coefficients is estimated, coded and transmitted to the decoder. Huffman encoding is preferably applied to quantization indices for both the coded spectral coefficients as well as the encoded norms.
- FIG. 2 is a block diagram of an exemplary decoder suitable for full-band audio decoding.
- the transient flag is first decoded which indicates the frame configuration, i.e. stationary or transient.
- the spectral envelope is decoded and the same, bit-exact, norm adjustments and bit-allocation algorithms are used at the decoder to re-compute the bit-allocation, which is essential for decoding quantization indices of the normalized transform coefficients.
- low frequency non-coded spectral coefficients are regenerated, preferably by using a spectral-fill codebook built from the received spectral coefficients (spectral coefficients with non-zero bit allocation).
- Noise level adjustment index may be used to adjust the level of the regenerated coefficients.
- High frequency non-coded spectral coefficients are preferably regenerated using bandwidth extension.
- the decoded spectral coefficients and regenerated spectral coefficients are mixed and lead to a normalized spectrum.
- the decoded spectral envelope is applied leading to the decoded full-band spectrum.
- the inverse transform is applied to recover the time-domain decoded signal. This is preferably performed by applying either the Inverse Modified Discrete Cosine Transform (IMDCT) for stationary modes, or the inverse of the higher temporal resolution transform for transient mode.
- IMDCT Inverse Modified Discrete Cosine Transform
- the algorithm adapted for full-band extension is based on adaptive transform-coding technology. It operates on 20 ms frames of input and output audio. Because the transform window (basis function length) is of 40 ms and a 50 percent overlap is used between successive input and output frames, the effective look-ahead buffer size is 20 ms. Hence, the overall algorithmic delay is of 40 ms which is the sum of the frame size plus the look-ahead size. All other additional delays experienced in use of a G.722.1 full-band codec (ITU-T G.719) are either due to computational and/or network transmission delays.
- FIG. 3 A general and typical coding scheme relative to a perceptual transform coder will be described with reference to FIG. 3 .
- the corresponding decoding scheme will be presented with reference to FIG. 4 .
- the first step of the coding scheme or process consists of a time-domain processing usually called windowing of the signal, which results in a time segmentation of an input audio signal.
- the time to frequency domain transform used by the codec could be, for example: Discrete Fourier Transform (DFT), according to Equation 1,
- X[k] is the DFT of the windowed input signal x[n].
- N is the size of the window w[n]
- n is the time index and k the frequency bin index
- DCT Discrete Cosine Transform
- MDCT Modified Discrete Cosine Transform
- X[k] is the MDCT of a windowed input signal
- x[n] N is the size of the window w[n]
- n is the time index
- k the frequency bin index.
- a perceptual audio codec aims at decomposing the spectrum, or its approximation, regarding the critical bands of the auditory systems e.g. the so-called Bark scale, or an approximation of the Bark scale, or some other frequency scale.
- the Bark scale is a standardized scale of frequency, where each “Bark” (named after Barkhausen) constitutes one critical bandwidth.
- This step can be achieved by a frequency grouping of the transform coefficients according to a perceptual scale established according to the critical bands, see Equation 3.
- X b [k] ⁇ X[k] ⁇ ,k ⁇ [k b , . . . ,k b+1 ⁇ 1 ],b ⁇ [ 1, . . . N b ], (3)
- N b is the number of frequency or psychoacoustical bands
- k the frequency bin index
- b is a relative index.
- a perceptual transform codec relies on the estimation of the Masking Threshold MT[b] in order to derive a frequency shaping function e.g. the Scale Factors SF[b], applied to the transform coefficients X b [k] in the psychoacoustical sub-band domain.
- the perceptual coder can then exploit the perceptually scaled spectrum for coding purpose.
- a quantization and coding process can perform the redundancy reduction, which will be able to focus on the most perceptually relevant coefficients of the original spectrum by using the scaled spectrum.
- the inverse operation is achieved by using the de-quantization and decoding of the received binary flux e.g. bitstream. This step is followed by the inverse Transform (Inverse MDCT-IMDCT or inverse DFT-IDFT, etc.) to get the signal back to the time domain. Finally, the overlap-add method is used to generate the perceptually reconstructed audio signal, i.e. lossy coding since only the perceptually relevant coefficients are decoded.
- the inverse Transform Inverse MDCT-IMDCT or inverse DFT-IDFT, etc.
- the invention performs a suitable frequency processing which allows the scaling of transform coefficients so that the coding does not modify the final perception.
- the present invention enables the psychoacoustical modeling to meet the requirements of very low complexity applications. This is achieved by using straightforward and simplified computation of the scale factors. Subsequently, an adaptive companding/expanding of the scale factors allows low bit rate fullband audio coding with high perceptual audio quality.
- the technique of the present invention enables perceptually optimizing the bit allocation of the quantizer such that all perceptually relevant coefficients are quantized independently of the original signal or spectrum dynamics range.
- an audio signal e.g. a speech signal is provided for encoding. It is processed according to standard procedures, as described previously, thus resulting in a windowed and time segmented input audio signal.
- Transform coefficients are initially determined in step 210 for the thus time segmented input audio signal.
- perceptually grouped coefficients or perceptual frequency sub-bands are determined in step 212 , e.g. according to the Bark scale or some other scale.
- a masking threshold is determined in step 214 .
- scale factors are computed for each sub-band or coefficient in step 216 .
- the thus computed scale factors are adapted in step 218 to prevent energy loss due to encoding for the perceptually relevant sub-bands, i.e. the sub-bands that actually affect the listening experience at a receiving person or apparatus.
- This adaptation will therefore maintain the energy of the relevant sub-bands and therefore will maximize the perceived quality of the decoded audio signal.
- FIG. 6 a further specific embodiment of a psychoacoustical model according to the present invention will be described.
- the embodiment enables the computations of Scale Factors, SF[b] for each psychoacoustical sub-band, b, defined by the model.
- Bark scale the so called Bark scale
- the embodiment is described with emphasis on the so called Bark scale, it is with only minor adjustment equally applicable to any suitable perceptual scale. Without loss of generality, consider a high frequency resolution for the low frequencies (groups of few transform coefficients) and inversely for the high frequencies.
- the number of coefficients per sub-band can be defined by a perceptual scale, for example the Equivalent Rectangular Bandwidth (ERB) that is considered as a good approximation of the so-called Bark scale, or by the frequency resolution of the quantizer used afterwards.
- ERB Equivalent Rectangular Bandwidth
- An alternative solution can be to use a combination of the two depending on the coding scheme used.
- N b is the number of psychoacoustical sub-bands
- k the frequency bin index
- b is a relative index.
- the psychoacoustical model according to the present invention Based on the determination of the perceptual coefficients or critical sub-bands e.g. Bark Spectrum, the psychoacoustical model according to the present invention performs the aforementioned low-complexity computation of the Masking Thresholds MT.
- the second step relies on the spreading effect of frequency masking described in [2].
- the psychoacoustical model hereby presented, takes into account both forward and backward spreading within a simplified equation as defined by the following
- the ATH is commonly defined as the volume level at which a subject can detect a particular sound 50% of the time.
- the proposed low-complexity model of the present invention aims at computing the Scale Factors, SF[b], for each psychoacoustical sub-band.
- the SF computation relies both on a normalization step, and on an adaptive companding/expanding step.
- the accumulated energy in all sub-bands for the MT computation may be normalized after application of the spreading of masking.
- SF ⁇ [ b ] ⁇ ⁇ ( SF ⁇ [ b ] - min ⁇ ( SF ) ) ( max ⁇ ( SF ) - min ⁇ ( SF ) ) , b ⁇ [ 1 , ... ⁇ , N b ] ( 11 )
- the Scale Factors can be adjusted so that no energy loss can appear for perceptually relevant sub-bands.
- low SF values lower than 6 dB
- sub-bands frequencies below 500 Hz
- step 218 of adapting the scale factors is further comprising a step 219 of adaptively companding the scale factors, and the step 220 of adaptively smoothing the scale factors.
- the method according to the invention additionally performs a suitable mapping of the spectral information to the quantizer range used by the transform-domain codec.
- the dynamics of the input spectral norms are adaptively mapped to the quantizer range in order to optimize the coding of the signal dominant parts. This is achieved by computing a weighted function, which is able to either compand, or expand the original spectral norms to the quantizer range. This enables full-band audio coding with high audio quality at several data rates (medium and low rates) without modifying the final perception.
- One strong advantage of the invention is also the low complexity computation of the weighted function in order to meet the requirements of very low complexity (and low delay) applications.
- the signal to map to the quantizer corresponds to the norm (root mean-square) of the input signal in a transformed spectral domain (e.g. frequency domain).
- the sub-band frequency decomposition (sub-band boundaries) of these norms has to map to the quantizer frequency resolution (sub-bands with index b).
- the norms are then level adjusted and a dominant norm is computed for each sub-band b according to the neighbor norms (forward and backward smoothed) and an absolute minimum energy. The details of the operation are described in the following.
- Equation 12 the norms (Spe(p)) are mapped to the spectral domain. This is performed according to the following linear operation, see Equation 12
- B MAX is the maximum number of sub-bands (20 for this specific implementation).
- the values of H b , T b and J b are defined in the Table 1 which is based on a quantizer using 44 spectral sub-bands.
- J b is a summation interval which corresponds to the transformed domain sub-band numbers.
- the mapped spectrum BSpe(b) is forward smoothed according to Equation 13
- BSpe ⁇ ( b ) ⁇ max ⁇ ⁇ BSpe ⁇ ( b ) ⁇ - min ⁇ ⁇ BSpe ⁇ ( b ) ⁇ ⁇ [ BSpe ⁇ ( b ) - min ⁇ ⁇ BSpe ⁇ ( b ) ⁇ ] ( 16 )
- the weighting function is computed such that it compands the signal if its dynamics exceed the quantizer range, and extends the signal if its dynamics does not cover the full range of the quantizer.
- the weighting function is applied to the original norms to generate the weighted norms which will feed the quantizer.
- the arrangement comprises an input/output unit I/O for transmitting and receiving audio signals or representations of audio signals for processing.
- the arrangement comprises transform determining means 310 adapted to determine transform coefficients representative of a time to frequency transformation of a received time segmented input audio signal, or representation of such audio signal.
- the transform determination unit can be adapted to or connected to a norm unit 311 adapted for normalizing the determined coefficients. This is indicated by the dotted line in FIG. 8 .
- the arrangement comprises a unit 312 for determining a spectrum of perceptual sub-bands for the input audio signal, or representation thereof, based on the determined transform coefficients, or normalized transform coefficients.
- a masking unit 314 is provided for determining masking thresholds MT for each said sub-band based on said determined spectrum.
- the arrangement comprises a unit 316 for computing scale factors for each said sub-band based on said determined masking thresholds.
- This unit 316 can be provided with or be connected to adapting means 318 for adapting said computed scale factors for each said sub-band to prevent energy loss for perceptually relevant sub-bands.
- the adapting unit 318 comprises a unit 319 for adaptively companding the determined scale factors, and a unit 320 for adaptively smoothing the determined scale factors.
- the above described arrangement can be included in or be connectable to an encoder or encoder arrangement in a telecommunication system.
- Advantages of the present invention comprise: low complexity computation with high quality fullband audio flexible frequency resolution adapted to the quantizer adaptive companding/expanding of the scale factors.
Abstract
Description
where X[k] is the DFT of the windowed input signal x[n]. N is the size of the window w[n], n is the time index and k the frequency bin index, Discrete Cosine Transform (DCT), Modified Discrete Cosine Transform (MDCT), according to Equation 2,
where X[k] is the MDCT of a windowed input signal x[n] N is the size of the window w[n], n is the time index and k the frequency bin index.
X b [k]={X[k]},kε[k b , . . . ,k b+1−1],bε[1, . . . N b], (3)
where Nb is the number of frequency or psychoacoustical bands, k the frequency bin index, and b is a relative index.
Xs b [k]=X b [k]×MT[b],kε[k b , . . . ,k b+1−1],bε[1, . . . ,N b] (4)
where Nb is the number of frequency or psychoacoustical bands, k the frequency bin index, and b is a relative index.
where Nb is the number of psychoacoustical sub-bands, k the frequency bin index, and b is a relative index.
MT[b]=BS[b]−29,bε[1, . . . ,N b] (6).
MT[b]=max(ATH[b],MT[b]),bε[1, . . . ,N b] (8).
MTnorm [b]=MT[b]−10×log10(L[N b]),bε[1, . . . ,N b] (9),
where L[1, . . . , Nb] are the length (number of transform coefficients) of each psychoacoustical sub-band b.
SF[b]=−MTnorm [b],bε[1, . . . ,N b] (10).
where BMAX is the maximum number of sub-bands (20 for this specific implementation). The values of Hb, Tb and Jb are defined in the Table 1 which is based on a quantizer using 44 spectral sub-bands. Jb is a summation interval which corresponds to the transformed domain sub-band numbers.
TABLE 1 |
Spectrum mapping constant |
b | Jb | Hb | Tb | A(b) |
0 | 0 | 1 | 3 | 8 |
1 | 1 | 1 | 3 | 6 |
2 | 2 | 1 | 3 | 3 |
3 | 3 | 1 | 3 | 3 |
4 | 4 | 1 | 3 | 3 |
5 | 5 | 1 | 3 | 3 |
6 | 6 | 1 | 3 | 3 |
7 | 7 | 1 | 3 | 3 |
8 | 8 | 1 | 3 | 3 |
9 | 9 | 1 | 3 | 3 |
10 | 10, 11 | 2 | 4 | 3 |
11 | 12, 13 | 2 | 4 | 3 |
12 | 14, 15 | 2 | 4 | 3 |
13 | 16, 17 | 2 | 5 | 3 |
14 | 18, 19 | 2 | 5 | 3 |
15 | 20, 21, 22, 23 | 4 | 6 | 3 |
16 | 24, 25, 26 | 3 | 6 | 4 |
17 | 27, 28, 29 | 3 | 6 | 5 |
18 | 30, 31, 32, 33, 34 | 5 | 7 | 7 |
19 | 35, 36, 37, 38, 39, 40, 41, 42, 43 | 9 | 8 | 11 |
BSpe(b)=max(BSpe(b),BSpe(b−1)−4),b=1 . . . ,B MAX, (13)
and backward smoothed according to Equation 14 below
BSpe(b)=max(BSpe(b),BSpe(b+1)−4),b=B MAX−1, . . . ,0 (14)
BSpe(b)=T(b)−max(BSpe(b),A(b)),b=0, . . . ,B MAX−1 (15)
where A(b) is given by Table 1. The resulting function, Equation 16 below, is further adaptively companded or expanded depending on the dynamic range of the spectrum (a=4 in this specific implementation)
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/939,931 US9153240B2 (en) | 2007-08-27 | 2013-07-11 | Transform coding of speech and audio signals |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US96815907P | 2007-08-27 | 2007-08-27 | |
US4424808P | 2008-04-11 | 2008-04-11 | |
PCT/SE2008/050967 WO2009029035A1 (en) | 2007-08-27 | 2008-08-26 | Improved transform coding of speech and audio signals |
US67411710A | 2010-09-08 | 2010-09-08 | |
US13/939,931 US9153240B2 (en) | 2007-08-27 | 2013-07-11 | Transform coding of speech and audio signals |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/674,117 Continuation US20110035212A1 (en) | 2007-08-27 | 2008-08-26 | Transform coding of speech and audio signals |
PCT/SE2008/050967 Continuation WO2009029035A1 (en) | 2007-08-27 | 2008-08-26 | Improved transform coding of speech and audio signals |
Publications (2)
Publication Number | Publication Date |
---|---|
US20140142956A1 US20140142956A1 (en) | 2014-05-22 |
US9153240B2 true US9153240B2 (en) | 2015-10-06 |
Family
ID=40387559
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/674,117 Abandoned US20110035212A1 (en) | 2007-08-27 | 2008-08-26 | Transform coding of speech and audio signals |
US13/939,931 Active 2028-08-27 US9153240B2 (en) | 2007-08-27 | 2013-07-11 | Transform coding of speech and audio signals |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/674,117 Abandoned US20110035212A1 (en) | 2007-08-27 | 2008-08-26 | Transform coding of speech and audio signals |
Country Status (8)
Country | Link |
---|---|
US (2) | US20110035212A1 (en) |
EP (1) | EP2186087B1 (en) |
JP (1) | JP5539203B2 (en) |
CN (1) | CN101790757B (en) |
AT (1) | ATE535904T1 (en) |
ES (1) | ES2375192T3 (en) |
HK (1) | HK1143237A1 (en) |
WO (1) | WO2009029035A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10311883B2 (en) * | 2007-08-27 | 2019-06-04 | Telefonaktiebolaget Lm Ericsson (Publ) | Transient detection with hangover indicator for encoding an audio signal |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101790757B (en) * | 2007-08-27 | 2012-05-30 | 爱立信电话股份有限公司 | Improved transform coding of speech and audio signals |
US9245529B2 (en) * | 2009-06-18 | 2016-01-26 | Texas Instruments Incorporated | Adaptive encoding of a digital signal with one or more missing values |
US8498874B2 (en) | 2009-09-11 | 2013-07-30 | Sling Media Pvt Ltd | Audio signal encoding employing interchannel and temporal redundancy reduction |
KR101483179B1 (en) * | 2010-10-06 | 2015-01-19 | 에스케이 텔레콤주식회사 | Frequency Transform Block Coding Method and Apparatus and Image Encoding/Decoding Method and Apparatus Using Same |
GB2487399B (en) * | 2011-01-20 | 2014-06-11 | Canon Kk | Acoustical synthesis |
US9548057B2 (en) * | 2011-04-15 | 2017-01-17 | Telefonaktiebolaget Lm Ericsson (Publ) | Adaptive gain-shape rate sharing |
RU2648595C2 (en) | 2011-05-13 | 2018-03-26 | Самсунг Электроникс Ко., Лтд. | Bit distribution, audio encoding and decoding |
CN102800317B (en) * | 2011-05-25 | 2014-09-17 | 华为技术有限公司 | Signal classification method and equipment, and encoding and decoding methods and equipment |
CN102208188B (en) * | 2011-07-13 | 2013-04-17 | 华为技术有限公司 | Audio signal encoding-decoding method and device |
WO2014046916A1 (en) | 2012-09-21 | 2014-03-27 | Dolby Laboratories Licensing Corporation | Layered approach to spatial audio coding |
CN103778918B (en) * | 2012-10-26 | 2016-09-07 | 华为技术有限公司 | The method and apparatus of the bit distribution of audio signal |
CN103854653B (en) | 2012-12-06 | 2016-12-28 | 华为技术有限公司 | The method and apparatus of signal decoding |
RU2740690C2 (en) | 2013-04-05 | 2021-01-19 | Долби Интернешнл Аб | Audio encoding device and decoding device |
EP3014609B1 (en) | 2013-06-27 | 2017-09-27 | Dolby Laboratories Licensing Corporation | Bitstream syntax for spatial voice coding |
FR3017484A1 (en) * | 2014-02-07 | 2015-08-14 | Orange | ENHANCED FREQUENCY BAND EXTENSION IN AUDIO FREQUENCY SIGNAL DECODER |
CN105225671B (en) | 2014-06-26 | 2016-10-26 | 华为技术有限公司 | Decoding method, Apparatus and system |
US10146500B2 (en) * | 2016-08-31 | 2018-12-04 | Dts, Inc. | Transform-based audio codec and method with subband energy smoothing |
EP3483879A1 (en) | 2017-11-10 | 2019-05-15 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Analysis/synthesis windowing function for modulated lapped transformation |
EP3483882A1 (en) | 2017-11-10 | 2019-05-15 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Controlling bandwidth in encoders and/or decoders |
EP3483878A1 (en) | 2017-11-10 | 2019-05-15 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio decoder supporting a set of different loss concealment tools |
WO2019091573A1 (en) * | 2017-11-10 | 2019-05-16 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for encoding and decoding an audio signal using downsampling or interpolation of scale parameters |
EP3483884A1 (en) | 2017-11-10 | 2019-05-15 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Signal filtering |
EP3483883A1 (en) | 2017-11-10 | 2019-05-15 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio coding and decoding with selective postfiltering |
EP3483886A1 (en) | 2017-11-10 | 2019-05-15 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Selecting pitch lag |
EP3483880A1 (en) | 2017-11-10 | 2019-05-15 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Temporal noise shaping |
WO2019091576A1 (en) | 2017-11-10 | 2019-05-16 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoders, audio decoders, methods and computer programs adapting an encoding and decoding of least significant bits |
WO2019199995A1 (en) * | 2018-04-11 | 2019-10-17 | Dolby Laboratories Licensing Corporation | Perceptually-based loss functions for audio encoding and decoding based on machine learning |
US10455335B1 (en) * | 2018-07-20 | 2019-10-22 | Mimi Hearing Technologies GmbH | Systems and methods for modifying an audio signal using custom psychoacoustic models |
US10966033B2 (en) * | 2018-07-20 | 2021-03-30 | Mimi Hearing Technologies GmbH | Systems and methods for modifying an audio signal using custom psychoacoustic models |
EP3598440B1 (en) * | 2018-07-20 | 2022-04-20 | Mimi Hearing Technologies GmbH | Systems and methods for encoding an audio signal using custom psychoacoustic models |
EP3614380B1 (en) | 2018-08-22 | 2022-04-13 | Mimi Hearing Technologies GmbH | Systems and methods for sound enhancement in audio systems |
Citations (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0402973A1 (en) | 1989-06-02 | 1990-12-19 | Koninklijke Philips Electronics N.V. | Digital transmission system, transmitter and receiver for use in the transmission system, and record carrier obtained by means of the transmitter in the form of a recording device |
US5079547A (en) | 1990-02-28 | 1992-01-07 | Victor Company Of Japan, Ltd. | Method of orthogonal transform coding/decoding |
US5508949A (en) | 1993-12-29 | 1996-04-16 | Hewlett-Packard Company | Fast subband filtering in digital signal coding |
US5627938A (en) | 1992-03-02 | 1997-05-06 | Lucent Technologies Inc. | Rate loop processor for perceptual encoder/decoder |
US5734792A (en) | 1993-02-19 | 1998-03-31 | Matsushita Electric Industrial Co., Ltd. | Enhancement method for a coarse quantizer in the ATRAC |
US5752225A (en) * | 1989-01-27 | 1998-05-12 | Dolby Laboratories Licensing Corporation | Method and apparatus for split-band encoding and split-band decoding of audio information using adaptive bit allocation to adjacent subbands |
US5774842A (en) | 1995-04-20 | 1998-06-30 | Sony Corporation | Noise reduction method and apparatus utilizing filtering of a dithered signal |
EP0967593A1 (en) | 1998-06-26 | 1999-12-29 | Ricoh Company, Ltd. | Audio coding and quantization method |
EP1139336A2 (en) | 2000-03-30 | 2001-10-04 | Matsushita Electric Industrial Co., Ltd. | Determination of quantizaion coefficients for a subband audio encoder |
US6578162B1 (en) | 1999-01-20 | 2003-06-10 | Skyworks Solutions, Inc. | Error recovery method and apparatus for ADPCM encoded speech |
JP2003280695A (en) | 2002-03-19 | 2003-10-02 | Sanyo Electric Co Ltd | Method and apparatus for compressing audio |
US20030212551A1 (en) * | 2002-02-21 | 2003-11-13 | Kenneth Rose | Scalable compression of audio and other signals |
EP1367566A2 (en) | 1997-06-10 | 2003-12-03 | Coding Technologies Sweden AB | Source coding enhancement using spectral-band replication |
US6704705B1 (en) | 1998-09-04 | 2004-03-09 | Nortel Networks Limited | Perceptual audio coding |
US20040131204A1 (en) * | 2003-01-02 | 2004-07-08 | Vinton Mark Stuart | Reducing scale factor transmission cost for MPEG-2 advanced audio coding (AAC) using a lattice based post processing technique |
US6772111B2 (en) | 2000-05-30 | 2004-08-03 | Ricoh Company, Ltd. | Digital audio coding apparatus, method and computer readable medium |
JP2004341384A (en) | 2003-05-19 | 2004-12-02 | Sharp Corp | Digital signal recording/reproducing apparatus and its control program |
EP1517324A2 (en) | 1993-03-09 | 2005-03-23 | Sony Corporation | Methods and apparatus for recording reproducing, transmitting and/or receiving compressed data and recording medium therefor |
US20060004565A1 (en) * | 2004-07-01 | 2006-01-05 | Fujitsu Limited | Audio signal encoding device and storage medium for storing encoding program |
US20070016427A1 (en) | 2005-07-15 | 2007-01-18 | Microsoft Corporation | Coding and decoding scale factor information |
US20070033022A1 (en) * | 2005-08-03 | 2007-02-08 | He Ouyang | Method of bitrate control and adjustment for audio coding |
US20070162277A1 (en) * | 2006-01-12 | 2007-07-12 | Stmicroelectronics Asia Pacific Pte., Ltd. | System and method for low power stereo perceptual audio coding using adaptive masking threshold |
US20070233474A1 (en) | 2006-03-30 | 2007-10-04 | Samsung Electronics Co., Ltd. | Apparatus and method for quantization in digital communication system |
US7305346B2 (en) | 2002-03-19 | 2007-12-04 | Sanyo Electric Co., Ltd. | Audio processing method and audio processing apparatus |
USRE40280E1 (en) | 1988-12-30 | 2008-04-29 | Lucent Technologies Inc. | Rate loop processor for perceptual encoder/decoder |
US7454327B1 (en) * | 1999-10-05 | 2008-11-18 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandtren Forschung E.V. | Method and apparatus for introducing information into a data stream and method and apparatus for encoding an audio signal |
US7565296B2 (en) | 2003-12-27 | 2009-07-21 | Lg Electronics Inc. | Digital audio watermark inserting/detecting apparatus and method |
US7668715B1 (en) * | 2004-11-30 | 2010-02-23 | Cirrus Logic, Inc. | Methods for selecting an initial quantization step size in audio encoders and systems using the same |
US7873510B2 (en) | 2006-04-28 | 2011-01-18 | Stmicroelectronics Asia Pacific Pte. Ltd. | Adaptive rate control algorithm for low complexity AAC encoding |
US20110035212A1 (en) * | 2007-08-27 | 2011-02-10 | Telefonaktiebolaget L M Ericsson (Publ) | Transform coding of speech and audio signals |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3134363B2 (en) * | 1991-07-16 | 2001-02-13 | ソニー株式会社 | Quantization method |
CN1065400C (en) * | 1998-09-01 | 2001-05-02 | 国家科学技术委员会高技术研究发展中心 | Compatible AC-3 and MPEG-2 audio-frequency code-decode device and its computing method |
JP2002268693A (en) * | 2001-03-12 | 2002-09-20 | Mitsubishi Electric Corp | Audio encoding device |
JP3881946B2 (en) * | 2002-09-12 | 2007-02-14 | 松下電器産業株式会社 | Acoustic encoding apparatus and acoustic encoding method |
WO2005004113A1 (en) * | 2003-06-30 | 2005-01-13 | Fujitsu Limited | Audio encoding device |
JP4350718B2 (en) * | 2006-03-22 | 2009-10-21 | 富士通株式会社 | Speech encoding device |
-
2008
- 2008-08-26 CN CN200880104834XA patent/CN101790757B/en active Active
- 2008-08-26 AT AT08828229T patent/ATE535904T1/en active
- 2008-08-26 EP EP08828229A patent/EP2186087B1/en active Active
- 2008-08-26 JP JP2010522867A patent/JP5539203B2/en active Active
- 2008-08-26 US US12/674,117 patent/US20110035212A1/en not_active Abandoned
- 2008-08-26 ES ES08828229T patent/ES2375192T3/en active Active
- 2008-08-26 WO PCT/SE2008/050967 patent/WO2009029035A1/en active Application Filing
-
2010
- 2010-10-07 HK HK10109570.7A patent/HK1143237A1/en unknown
-
2013
- 2013-07-11 US US13/939,931 patent/US9153240B2/en active Active
Patent Citations (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
USRE40280E1 (en) | 1988-12-30 | 2008-04-29 | Lucent Technologies Inc. | Rate loop processor for perceptual encoder/decoder |
US5752225A (en) * | 1989-01-27 | 1998-05-12 | Dolby Laboratories Licensing Corporation | Method and apparatus for split-band encoding and split-band decoding of audio information using adaptive bit allocation to adjacent subbands |
EP0402973A1 (en) | 1989-06-02 | 1990-12-19 | Koninklijke Philips Electronics N.V. | Digital transmission system, transmitter and receiver for use in the transmission system, and record carrier obtained by means of the transmitter in the form of a recording device |
US5079547A (en) | 1990-02-28 | 1992-01-07 | Victor Company Of Japan, Ltd. | Method of orthogonal transform coding/decoding |
US5627938A (en) | 1992-03-02 | 1997-05-06 | Lucent Technologies Inc. | Rate loop processor for perceptual encoder/decoder |
US5734792A (en) | 1993-02-19 | 1998-03-31 | Matsushita Electric Industrial Co., Ltd. | Enhancement method for a coarse quantizer in the ATRAC |
EP1517324A2 (en) | 1993-03-09 | 2005-03-23 | Sony Corporation | Methods and apparatus for recording reproducing, transmitting and/or receiving compressed data and recording medium therefor |
US5508949A (en) | 1993-12-29 | 1996-04-16 | Hewlett-Packard Company | Fast subband filtering in digital signal coding |
US5774842A (en) | 1995-04-20 | 1998-06-30 | Sony Corporation | Noise reduction method and apparatus utilizing filtering of a dithered signal |
EP1367566A2 (en) | 1997-06-10 | 2003-12-03 | Coding Technologies Sweden AB | Source coding enhancement using spectral-band replication |
EP0967593A1 (en) | 1998-06-26 | 1999-12-29 | Ricoh Company, Ltd. | Audio coding and quantization method |
US6704705B1 (en) | 1998-09-04 | 2004-03-09 | Nortel Networks Limited | Perceptual audio coding |
US6578162B1 (en) | 1999-01-20 | 2003-06-10 | Skyworks Solutions, Inc. | Error recovery method and apparatus for ADPCM encoded speech |
US7454327B1 (en) * | 1999-10-05 | 2008-11-18 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandtren Forschung E.V. | Method and apparatus for introducing information into a data stream and method and apparatus for encoding an audio signal |
EP1139336A2 (en) | 2000-03-30 | 2001-10-04 | Matsushita Electric Industrial Co., Ltd. | Determination of quantizaion coefficients for a subband audio encoder |
US6772111B2 (en) | 2000-05-30 | 2004-08-03 | Ricoh Company, Ltd. | Digital audio coding apparatus, method and computer readable medium |
US20030212551A1 (en) * | 2002-02-21 | 2003-11-13 | Kenneth Rose | Scalable compression of audio and other signals |
JP2003280695A (en) | 2002-03-19 | 2003-10-02 | Sanyo Electric Co Ltd | Method and apparatus for compressing audio |
US7305346B2 (en) | 2002-03-19 | 2007-12-04 | Sanyo Electric Co., Ltd. | Audio processing method and audio processing apparatus |
US20040131204A1 (en) * | 2003-01-02 | 2004-07-08 | Vinton Mark Stuart | Reducing scale factor transmission cost for MPEG-2 advanced audio coding (AAC) using a lattice based post processing technique |
US7272566B2 (en) | 2003-01-02 | 2007-09-18 | Dolby Laboratories Licensing Corporation | Reducing scale factor transmission cost for MPEG-2 advanced audio coding (AAC) using a lattice based post processing technique |
JP2004341384A (en) | 2003-05-19 | 2004-12-02 | Sharp Corp | Digital signal recording/reproducing apparatus and its control program |
US7565296B2 (en) | 2003-12-27 | 2009-07-21 | Lg Electronics Inc. | Digital audio watermark inserting/detecting apparatus and method |
US20060004565A1 (en) * | 2004-07-01 | 2006-01-05 | Fujitsu Limited | Audio signal encoding device and storage medium for storing encoding program |
US7668715B1 (en) * | 2004-11-30 | 2010-02-23 | Cirrus Logic, Inc. | Methods for selecting an initial quantization step size in audio encoders and systems using the same |
US20070016427A1 (en) | 2005-07-15 | 2007-01-18 | Microsoft Corporation | Coding and decoding scale factor information |
US20070033022A1 (en) * | 2005-08-03 | 2007-02-08 | He Ouyang | Method of bitrate control and adjustment for audio coding |
US20070162277A1 (en) * | 2006-01-12 | 2007-07-12 | Stmicroelectronics Asia Pacific Pte., Ltd. | System and method for low power stereo perceptual audio coding using adaptive masking threshold |
US20070233474A1 (en) | 2006-03-30 | 2007-10-04 | Samsung Electronics Co., Ltd. | Apparatus and method for quantization in digital communication system |
US7873510B2 (en) | 2006-04-28 | 2011-01-18 | Stmicroelectronics Asia Pacific Pte. Ltd. | Adaptive rate control algorithm for low complexity AAC encoding |
US20110035212A1 (en) * | 2007-08-27 | 2011-02-10 | Telefonaktiebolaget L M Ericsson (Publ) | Transform coding of speech and audio signals |
Non-Patent Citations (2)
Title |
---|
Office Action issued in corresponding Japanese patent application No. 2010-522867 on Dec. 17, 2012, 2 pages. |
Office Action issued in corresponding Japanese patent application No. 2010-522867 on Jul. 12, 2013, 2 pages. |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10311883B2 (en) * | 2007-08-27 | 2019-06-04 | Telefonaktiebolaget Lm Ericsson (Publ) | Transient detection with hangover indicator for encoding an audio signal |
US11830506B2 (en) | 2007-08-27 | 2023-11-28 | Telefonaktiebolaget Lm Ericsson (Publ) | Transient detection with hangover indicator for encoding an audio signal |
Also Published As
Publication number | Publication date |
---|---|
JP2010538316A (en) | 2010-12-09 |
ES2375192T3 (en) | 2012-02-27 |
CN101790757B (en) | 2012-05-30 |
US20110035212A1 (en) | 2011-02-10 |
JP5539203B2 (en) | 2014-07-02 |
EP2186087B1 (en) | 2011-11-30 |
EP2186087A1 (en) | 2010-05-19 |
EP2186087A4 (en) | 2010-11-24 |
CN101790757A (en) | 2010-07-28 |
US20140142956A1 (en) | 2014-05-22 |
ATE535904T1 (en) | 2011-12-15 |
HK1143237A1 (en) | 2010-12-24 |
WO2009029035A1 (en) | 2009-03-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9153240B2 (en) | Transform coding of speech and audio signals | |
JP5219800B2 (en) | Economical volume measurement of coded audio | |
US8615391B2 (en) | Method and apparatus to extract important spectral component from audio signal and low bit-rate audio signal coding and/or decoding method and apparatus using the same | |
US9305558B2 (en) | Multi-channel audio encoding/decoding with parametric compression/decompression and weight factors | |
US7337118B2 (en) | Audio coding system using characteristics of a decoded signal to adapt synthesized spectral components | |
US8392202B2 (en) | Low-complexity spectral analysis/synthesis using selectable time resolution | |
US20040162720A1 (en) | Audio data encoding apparatus and method | |
US20080140405A1 (en) | Audio coding system using characteristics of a decoded signal to adapt synthesized spectral components | |
US20230206930A1 (en) | Multi-channel signal generator, audio encoder and related methods relying on a mixing noise signal | |
US10902860B2 (en) | Signal encoding method and apparatus, and signal decoding method and apparatus | |
KR100378796B1 (en) | Digital audio encoder and decoding method | |
US20230133513A1 (en) | Audio decoder, audio encoder, and related methods using joint coding of scale parameters for channels of a multi-channel audio signal | |
Trinkaus et al. | An algorithm for compression of wideband diverse speech and audio signals | |
Moya et al. | Survey of Error Concealment Schemes for Real-Time Audio Transmission Systems | |
Robles Moya | Survey of error concealment schemes for real-time audio transmission systems | |
Mandal et al. | Digital Audio Compression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL), SWEDEN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BRIAND, MANUEL;TALEB, ANISSE;SIGNING DATES FROM 20090423 TO 20090424;REEL/FRAME:032354/0638 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
CC | Certificate of correction | ||
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |