CN101800050B

CN101800050B - Audio fine scalable coding method and system based on perception self-adaption bit allocation

Info

Publication number: CN101800050B
Application number: CN201010107402A
Authority: CN
Inventors: 胡瑞敏; 杨玉红; 刘元元; 陈冰; 高丽; 项慨; 周超群; 杭波
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2010-02-03
Filing date: 2010-02-03
Publication date: 2012-10-10
Anticipated expiration: 2030-02-03
Also published as: CN101800050A

Abstract

The invention relates to the technical field of audio coding, in particular to audio fine scalable coding method and system based on perception self-adaption bit allocation. The method comprises the following steps of: preprocessing input signals; carrying out subband division on frequency-domain signals; calculating the perception importance of each subband; uniformly sequencing the subband from small to large according to the perception importance; extracting the subband with the largest perception importance for scalable longitudinal vector quantization; and then carrying out self-adaption adjustment on the vector-quantized subband with the largest perception importance. The system comprises a preprocessing module, a subband division module, a perception importance calculating, sequencing and extracting module, a scalable quantizing and coding module, a self-adaption adjusting module and a scalable coding finish judging module. The invention realizes high-efficient fine scalable audio coding, preferably realizes the unification of quantization precision and quantization efficiency and also satisfies the requirement of high tone quality when improving coding efficiency.

Description

Audio fine scalable coding method and system based on the perception self-adaption bit distribution

Technical field

The present invention relates to technical field of audio, relate in particular to a kind of audio fine scalable coding method and system that distributes based on perception self-adaption bit.

Background technology

The scalable audio coding technology is divided into a core layer and a plurality of enhancement layer with code stream; Wherein core layer guarantees the minimum reconstruction quality of signal; Enhancement layer then improves reconstruction quality through the mode that improves signal to noise ratio (S/N ratio) or extending bandwidth gradually, and the enhancing number of plies that receives is many more, and decoding tonequality is high more.

Scalable encoding can be through directly abandoning enhancement layer bitstream to adapt to network bandwidth fluctuation, and the meticulous more network bandwidth that just can effectively adapt to more of partition size fluctuates; On the other hand, the objective criteria that scalable audio coding performance quality is estimated is the perception signal to noise ratio (S/N ratio) of each hierarchical layer, and the subjective assessment standard also is each hierarchical layer decoded signal perceived quality.Therefore determine the steady perception self-adaption bit allocative decision that promotes of perceived quality of each hierarchical layer that graduated encoding performance quality is played crucial effects.

The more representative method of existing fine and classified audio coding method is the optimal bit distribution method that Moving Picture Experts Group-1 in 1994 adopts, and the frequency domain subband gradable method of the encoding and decoding speech standard of new generation that proposes of ITU-T in 2006 in G.729EV.

The optimal bit distribution method evenly is divided into a plurality of subbands with frequency-region signal; According to the ordering of subband perceptual important degree, adopt by the most important subband of 5 bit quantization method coding perceptibility, and carry out the adjustment of subband perceptual important degree; Feedback is proceeded the ordering of subband perceptual important degree and is pursued bit quantization; Finish or all sub-band coding completion up to Bit Allocation in Discrete, what wherein pursue the 5 bit quantization method employing is scalar quantization, and the subband most important information is carried out quantization encoding.The optimal bit distribution method has guaranteed the raising of coding quality, but because scalar quantization itself quantizes the compression defective, and this method has limited the raising of quantitative efficiency to a certain extent, low code check in can't being applicable to.

G.729EV the standard enhancement layer is 32 subbands with division of signal, adopts the criterion of estimating of perceptual importance, and each subband is sorted by the perceptual important degree; Result and distributable bit number according to ordering are confirmed the optimum bit allocative decision, and each subband MDCT coefficient is divided the sphere vector quantization, and G.729EV the Bit distribution method of standard enhancement layer coding employing is not optimum; The bit number of each subband of encoding is wasteful, and under the few situation of bit number, this Bit distribution method can only instruct scrambler quantization encoding minority subband; And most of sub-band information will be lost fully; Though this method has significant quantitative efficiency, but there is the phenomenon of Bit Allocation in Discrete inequality and waste bits, cause some subband bit serious waste; Some subband bit famine, thus the raising of tonequality finally influenced.

From above technology, current fine and classified audio coding is in two extreme states, and quantitative efficiency and partition size can not have a RUP preferably, and the method partition size that quantitative efficiency is high is just low, and the partition size that quantitative efficiency is low is just high.

Summary of the invention

The purpose of this invention is to provide a kind of audio fine scalable coding method and system that distributes based on perception self-adaption bit; So that perception self-adaption bit piece allocative decision and high-effective classifying vector quantization technology are combined; Realize fine and classified efficiently audio coding, realize the unification of quantified precision and quantitative efficiency preferably.

For achieving the above object, the present invention adopts following technical scheme:

A kind of audio fine scalable coding method that distributes based on perception self-adaption bit may further comprise the steps:

Step is 1.: the enhancement layer input signal is carried out pre-service, and wherein pre-service comprises that the enhancement layer input signal is carried out perceptual weighting to be handled and the time-frequency change process, obtains the signal frequency-domain representation after above-mentioned pre-service;

Step is 2.: carry out sub-band division to above-mentioned through the frequency-region signal that obtains after the pre-service, according to the method for even division whole frequency domain is divided into N subband, wherein N >=1;

Step is 3.: calculate the perceptual important degree of each subband, and unify the antithetical phrase tape sort according to the perceptual important degree according to order from big to small, extract the maximum subband of perceptual important degree;

Step is 4.: according to the maximum subband of perceptual important degree, carry out gradable vector quantification;

4. said step further comprises following substep:

Definition VQ_rank (k) is the quantification gradation of k subband, and to its initialization assignment is:

VQ_rank(0)=VQ_rank(1)...=VQ_rank(N-1)=0

K=0 wherein, 1 ... .N-1, the sub-band sum of N for dividing, N >=1;

The maximum subband k of perceptual important degree that obtains is carried out the vector quantization of VQ_rank (k) level, give the frequency spectrum vector Y _kDistribute the R bit, the vector after obtaining quantizing

Wherein R value size is by the partition size S decision of scalable coder;

Step is 5.: the maximum subband of the perceptual important degree behind the vector quantization is carried out self-adaptation adjustment, the initialization value Q=1 of said gradable quantification number of times Q;

5. said step further comprises following substep:

If ip (k) is Y _kThe perceptual important degree, calculate The perceptual important degree

And to Y _k, VQ_rank (k) and ip (k) carry out following self-adaptation to be revised:

Y_{k} = Y_{k} - {\hat{Y}}_{k}

VQ_rank(k)=VQ_rank(k)+1

ip (k) = ip (k) - \hat{ip (k)}

Q=Q+1

Wherein, 0≤k≤N-1;

Step is 6.: judge whether gradable quantification number of times arrives maximum times Q in the whole quantizing process _MaxIf, do not reach maximum times, then return step 3., if reach maximum times, then finish hierarchical coding.

Said step 3. in, if with the perceptual important degree criterion of sub belt energy as each subband, the spectrum energy that then calculates each subband and comprised; If with amplitude as perceptual important degree criterion, the spectrum amplitude that then calculates each subband and comprised.

A kind of audio fine scalable coding system that distributes based on perception self-adaption bit comprises:

Pre-processing module is used for the enhancement layer input signal is carried out pre-service, and wherein pre-service comprises that the enhancement layer input signal is carried out perceptual weighting to be handled and the time-frequency change process, obtains the signal frequency-domain representation after above-mentioned pre-service;

The sub-band division module is used for the above-mentioned frequency-region signal that obtains after handling through pre-processing module is carried out sub-band division, according to the method for even division whole frequency domain is divided into N subband, wherein N >=1;

Subband perceptual important degree calculates ordering and extraction module, is used to calculate the perceptual important degree of each subband, and unifies the antithetical phrase tape sort according to the perceptual important degree according to order from big to small, extracts the maximum subband of perceptual important degree;

The scalar quantization coding module is used for carrying out gradable vertical vector quantization according to the maximum subband of perceptual important degree;

The self-adaptation adjusting module is used for the maximum subband of the perceptual important degree behind the scalar quantization coding module vector quantization is carried out the self-adaptation adjustment;

Hierarchical coding finishes judge module, is used for judging whether the gradable quantification number of times of whole quantizing process arrives maximum times, and whether decision finishes hierarchical coding.

The perceptual weighting submodule is used for that input signal is carried out perceptual weighting and handles;

The time-frequency conversion submodule is used for that the signal after the perceptual weighting processing is carried out time-frequency conversion and handles.

Subband perceptual important degree calculates the ordering submodule, is used to calculate the perceptual important degree of each subband, and unifies the antithetical phrase tape sort according to the perceptual important degree according to order from big to small;

Perceptual important degree extraction module is used for the subband that subband perceptual important degree calculates after the ordering submodule sorts is extracted the maximum subband of perceptual important degree.

The present invention has the following advantages and good effect:

1) perception self-adaption bit piece allocative decision and high-effective classifying vector quantization technology are combined, realized fine and classified efficiently audio coding, realized the unification of quantified precision and quantitative efficiency preferably;

2) the present invention is that criterion antithetical phrase band carries out gradable vector quantification from people's ear apperceive characteristic with the perceptual important degree, has improved effectiveness of classification, has also satisfied the demand of high tone quality when improving code efficiency.

Description of drawings

Fig. 1 is the process flow diagram of the audio fine scalable coding method that distributes based on perception self-adaption bit provided by the invention.

Fig. 2 is sub-band division first synoptic diagram of the audio fine scalable coding method that distributes based on perception self-adaption bit provided by the invention.

Fig. 3 is sub-band division second synoptic diagram of the audio fine scalable coding method that distributes based on perception self-adaption bit provided by the invention.

Fig. 4 is the application synoptic diagram of the audio fine scalable coding system that distributes based on perception self-adaption bit provided by the invention.

Embodiment

The present invention mainly is that the perceptual important degree with subband is a criterion, the audio fine scalable coding method and the system that distribute based on perception self-adaption bit of proposition.

The present invention with the disposable the highest subband of perceptual important degree of distributing to of bit, has increased effectiveness of classification relatively, and pursues the method for Bit Allocation in Discrete relatively; Improved code efficiency; From people's ear apperceive characteristic, be criterion with the perceptual important degree, the antithetical phrase band carries out gradable vector quantification; Improve effectiveness of classification, combined accompanying drawing to describe the present invention in detail below respectively.

The audio fine scalable coding method that distributes based on perception self-adaption bit provided by the invention specifically may further comprise the steps, and is as shown in Figure 1, comprising:

Step 1: input signal is carried out pre-service, and wherein pre-service comprises that input signal is carried out perceptual weighting to be handled and the time-frequency change process, obtains the signal frequency-domain representation after above-mentioned pre-service;

Step 2: carry out sub-band division to above-mentioned through the frequency-region signal that obtains after the pre-service, whole frequency domain is divided into N subband, wherein N >=1 according to the method for even division;

Step 3: calculate the perceptual important degree of each subband, and unify the antithetical phrase tape sort according to order from big to small, extract the maximum subband of perceptual important degree according to the perceptual important degree;

The perceptual important degree criterion of concrete signal is different, if with the perceptual important degree criterion of sub belt energy as each subband, and the spectrum energy that then calculates each subband and comprised; If with amplitude as perceptual important degree criterion, the spectrum amplitude that then calculates each subband and comprised;

The perceptual important degree that defines each subband is ip (k), k=0,1...N-1; According to the perceptual important degree size of calculating gained, the ordering of perceptual important degree is carried out in each subband unification, extract maximum subband ip (k)=E (the k)=Max (ip (j)) of perceptual important degree, wherein k=0; 1 ... .N-1, j=0,1; 2 ... N-1, the sub-band sum of N for dividing;

Step 4:, carry out gradable vertical vector quantization according to the maximum subband of perceptual important degree; This step further can comprise following substep:

1 definition VQ_rank (k) is the quantification gradation of k subband, and to its initialization assignment is:

VQ_rank(0)=VQ_rank(1)...=VQ_rank(N-1)=0

K=0 wherein, 1 ... .N-1, the sub-band sum of N for dividing, N >=1;

2. the maximum subband k of perceptual important degree that obtains is carried out the vector quantization of VQ_rank (k) level, give the frequency spectrum vector Y _kDistribute the R bit, the vector after obtaining quantizing

Wherein R value size is by the partition size S decision of scalable coder;

Step 5: the maximum important perception importance degree subband behind the vector quantization is carried out the self-adaptation adjustment; These step concrete operations are following:

Definition Q _MaxBe maximum gradable number of times in the signal quantization process, its initialization Q=1 calculates The perceptual important degree

Y_{k} = Y_{k} - {\hat{Y}}_{k}

VQ_rank(k)=VQ_rank(k)+1

ip (k) = ip (k) - \hat{ip (k)}

Q=Q+1

Wherein, 0≤k≤N-1;

Step 6: judge whether gradable quantification number of times arrives maximum times in the whole quantizing process,, then return step 3,, then finish hierarchical coding if reach maximum times if do not reach maximum times.

The audio fine scalable coding system that distributes based on perception self-adaption bit provided by the invention comprises with lower module:

1. pre-processing module is used for input signal is carried out pre-service, and wherein pre-service comprises that input signal is carried out perceptual weighting to be handled and the time-frequency change process, obtains the signal frequency-domain representation after above-mentioned pre-service;

Pre-processing module further comprises perceptual weighting submodule, time-frequency conversion submodule,

The time-frequency conversion submodule is used for that the signal after the perceptual weighting processing is carried out time-frequency conversion and handles;

2. sub-band division module is used for the above-mentioned frequency-region signal that obtains after handling through pre-processing module is carried out sub-band division, according to the method for even division whole frequency domain is divided into N subband, wherein N >=1;

3. subband perceptual important degree calculates ordering and extraction module, is used to calculate the perceptual important degree of each subband, and unifies the antithetical phrase tape sort according to the perceptual important degree according to order from big to small, extracts the maximum subband of perceptual important degree;

This module comprises that further subband perceptual important degree calculates ordering submodule, perceptual important degree extraction module:

Perceptual important degree extraction module is used for the subband that subband perceptual important degree calculates after the ordering submodule sorts is extracted the maximum subband of perceptual important degree;

4. the scalar quantization coding module is used for carrying out gradable vertical vector quantization according to the maximum subband of perceptual important degree;

5. the self-adaptation adjusting module is used for the maximum important perception importance degree subband behind the scalar quantization coding module vector quantization is carried out the self-adaptation adjustment;

6. hierarchical coding finishes judge module, is used for judging whether the gradable quantification number of times of whole quantizing process arrives maximum times, and whether decision finishes hierarchical coding.

Further combine accompanying drawing that the present invention is described further below with specific embodiment:

Step 1: input signal is carried out pre-service, and pre-service specifically comprises perceptual weighting and two processes of time-frequency conversion;

1. input signal is sent into perceptual weighting filter W _LB(z), while γ ₁', γ ₂' and γ ₃' (0<γ ₁', γ ₂, ' γ ₃'<1) three also corresponding adjustment of coefficient are to relax quantization noise spectrum:

W_{LB} (z) = \frac{\hat{A} (z / {γ_{1}}^{'})}{\hat{A} (z / {γ_{2}}^{'})} (1 + Σ_{i = 1}^{2} a_{i} {γ_{3}}^{' i} z^{- i})

γ wherein ₁', γ ₂', γ ₃' for adjusting parameter, a _iBe the linear prediction analysis coefficient, i is the exponent number of linear prediction,

\hat{A} (z) = {\hat{a}}_{0} + {\hat{a}}_{1} z^{- 1} + \cdot \cdot \cdot + {\hat{a}}_{10} z^{- 10} .

2. time-frequency conversion is that time-domain signal is transformed into frequency domain, obtains the spectrum expression of sound signal, and present embodiment adopts the MDCT conversion.

Step 2: the frequency-region signal behind the time-frequency conversion is carried out spectral sub-bands divide, suppose entire spectrum evenly is divided into 64 subbands here;

Fig. 2 is for evenly being divided into the synoptic diagram of 8 subbands, and transverse axis is represented subband frequency domain division scope, and the longitudinal axis is represented frequency domain energy amplitude, and its medium and low frequency core layer coding is basis of the present invention, not in limit of consideration of the present invention; The subband that comes out according to residual computations is used numeral " 1 " to arrive " 8 " in the drawings and is indicated respectively, and wherein subband 1, subband 2, subband 3 and subband 4 are low frequency audio sub-bands; Subband 5, subband 6, subband 7 and subband 8 are high-frequency audio subbands; The division of 64 subbands and 8 sub-band division are in like manner;

Step 3: suppose the measurement standard of the energy of each subband here as subband perceptual important degree; Calculate the energy that each subband comprised of 64 subbands; And sort from big to small according to the energy size, extract the maximum subband of perceptual important degree, embodiment is:

1. defining ip (k) is the perceptual important degree of k subband, and E (k) is k the spectrum energy that subband comprised, and calculates the energy of each subband with following formula:

ip (k) = E (k) = Y_{k}^{}

K=0 wherein, 1...63, Y _kIt is the MDCT spectral coefficient that k subband comprises;

2. the energy size of calculating each subband of gained according to following formula be the measurement standard of perceptual important degree, and the ordering of perceptual important degree is carried out in each subband unification, and the subband of extraction perceptual important degree maximum is sent into step 4 and carried out vector quantization, specifically is expressed as:

ip(k)=Max(ip(j))

Wherein, 0≤k≤63, j=0 ..., 63;

Step 4: according to the maximum subband of perceptual important degree that step 3 obtains, according to this subband is carried out vertical vector quantization, suppose that here k subband is the maximum subband of perceptual important degree, concrete embodiment is:

VQ_rank(0)=VQ_rank(1)...=VQ_rank(63)=0

K=0 wherein, 1 ... .63, the sub-band sum of N for dividing;

2. the maximum subband k of perceptual important degree that obtains is carried out VQ_rank (k)=0 grade vector quantization, give the quantization vector Y of this subband _kDistribute R bit, wherein R partition size is as required adjusted, and between quantitative efficiency and partition size, weighs, and to handle frame length 20ms, partition size 1kbps is an example, and then R is 20 bits, the vector after obtaining quantizing

Step 5: the subband k behind step 4 vector quantization is carried out the self-adaptation finishing, and practical implementation is following:

Suppose Q _Max=10 are maximum gradable number of times in the signal quantization process, its initialization Q=1;

Calculate

The perceptual important degree

And to Y _k, VQ_rank (k) and ip (k) carry out following self-adaptation to be revised, that is:

Y_{k} = Y_{k} - {\hat{Y}}_{k}

VQ_rank(k)=VQ_rank(k)+1

ip (k) = ip (k) - \hat{ip (k)}

Q=Q+1

0≤k≤63 wherein;

Step 6: whether the gradable quantification number of times Q after judgement carry out step 5 is greater than Q _MaxIf, greater than then finish hierarchical coding, if not greater than Q _MaxThen proceed step 3.

Fig. 3 is 8 subband bit allocation amounts synoptic diagram, and transverse axis is represented subband frequency domain division scope, and the longitudinal axis is represented frequency domain energy amplitude, and its medium and low frequency core layer coding is basis of the present invention, not in limit of consideration of the present invention; Enhancement layer evenly is divided into 8 subbands, according to each sub belt energy amplitude relatively, finds the 6th sub belt energy maximum, this subband of encoding vector block 1, adjust the 6th sub belt energy; Rearrangement sub belt energy amplitude is found the 1st sub belt energy maximum, this subband of encoding vector block 2; By that analogy, the 1st to 18 vector block of encoding respectively.

The binaural signal imported among Fig. 4 through mix down, resume module such as pre-service, low pass and high-pass filtering obtain low strap residual signals and high band signal.Low strap residual signals and high band signal obtain output code flow output as the input of graduated encoding module through method scalar quantization provided by the invention.

Fig. 4 is the application of content of the present invention in whole audio coding framework; Wherein graduated encoding vector quantization 30 is realized the position of fine granulation hierarchical coding for the present invention; With content application of the present invention in the gradable vector quantification of coding framework; Instruct audio coding, improve quantitative efficiency and quantified precision.

Claims

1. an audio fine scalable coding method that distributes based on perception self-adaption bit is characterized in that, may further comprise the steps:

4. said step further comprises following substep:

VQ_rank(0)=VQ_rank(1)...=VQ_rank(N-1)=0

K=0 wherein, 1 ... .N-1, the sub-band sum of N for dividing, N >=1;

The maximum subband k of perceptual important degree that obtains is carried out the vector quantization of VQ_rank (k) level, give the frequency spectrum vector Y _kDistribute the R bit, the vector after obtaining quantizing Wherein R value size is by the partition size S decision of scalable coder;

5. said step further comprises following substep:

If ip (k) is Y _kThe perceptual important degree, calculate

The perceptual important degree

Y_{k} = Y_{k} - {\hat{Y}}_{k}

VQ_rank(k)=VQ_rank(k)+1

ip (k) = ip (k) - \hat{ip (k)}

Q=Q+1

Wherein, 0≤k≤N-1;

2. the audio fine scalable coding method that distributes based on perception self-adaption bit according to claim 1; It is characterized in that: said step 3. in; If with the perceptual important degree criterion of sub belt energy as each subband, the spectrum energy that then calculates each subband and comprised; If with amplitude as perceptual important degree criterion, the spectrum amplitude that then calculates each subband and comprised.

3. a system that realizes the audio fine scalable coding method that distributes based on perception self-adaption bit as claimed in claim 1 is characterized in that, comprising:

The scalar quantization coding module is used for carrying out gradable vector quantification according to the maximum subband of perceptual important degree;

4. the audio fine scalable coding system that distributes based on perception self-adaption bit according to claim 3 is characterized in that said pre-processing module further comprises:

5. according to claim 3 or the 4 described audio fine scalable coding systems that distribute based on perception self-adaption bit, it is characterized in that said subband perceptual important degree calculates ordering and extraction module further comprises: