Continuous Fourier Equal to Constant Value

Short-time Fourier transform with variable resolution

In mathematics and signal processing, the constant-Q transform and variable-Q transform, simply known as CQT and VQT, transforms a data series to the frequency domain. It is related to the Fourier transform[1] and very closely related to the complex Morlet wavelet transform.[2] Its design is suited for musical representation.

Constant-Q transform applied to the waveform of a C major piano chord. The x-axis is frequency, mapped to standard musical pitches, from low (left) to high (right). The y-axis is time, starting from pressing the piano chord at the bottom, and releasing the piano chord at the top, 8 seconds later. Darker pixels correspond to higher values of the Constant-Q transform. The peaks correspond closely to the precise frequencies of the vibrating piano strings. Thus the peaks can be used to detect the notes played on the piano. The lowest 3 peaks are the fundamental frequencies of the C major chord (C, E, G). Each string also vibrates at multiples of the fundamental, known as overtones, which correspond to the remaining smaller peaks to the right of the fundamental pitches. The overtones are smaller in intensity than the fundamental pitch.

Audio of the C Major piano chord used to generate the Constant-Q transform above.

Its waveform does not visually communicate pitch information like the Constant-Q transform is able to do.

The transform can be thought of as a series of filters f k , logarithmically spaced in frequency, with the k-th filter having a spectral width δf k equal to a multiple of the previous filter's width:

δ f k = 2 1 / n δ f k 1 = ( 2 1 / n ) k δ f min , {\displaystyle \delta f_{k}=2^{1/n}\cdot \delta f_{k-1}=\left(2^{1/n}\right)^{k}\cdot \delta f_{\text{min}},}

where δf k is the bandwidth of the k-th filter, f min is the central frequency of the lowest filter, and n is the number of filters per octave.

Calculation [edit]

The short-time Fourier transform of x[n] for a frame shifted to sample m is calculated as follows:

X [ k , m ] = n = 0 N 1 W [ n m ] x [ n ] e j 2 π k n / N . {\displaystyle X[k,m]=\sum _{n=0}^{N-1}W[n-m]x[n]e^{-j2\pi kn/N}.}

Given a data series at sampling frequency f s = 1/T, T being the sampling period of our data, for each frequency bin we can define the following:

  • Filter width, δf k .
  • Q, the "quality factor":
Q = f k δ f k . {\displaystyle Q={\frac {f_{k}}{\delta f_{k}}}.}
This is shown below to be the integer number of cycles processed at a center frequency fk . As such, this somewhat defines the time complexity of the transform.
  • Window length for the k-th bin:
N [ k ] = f s δ f k = f s f k Q . {\displaystyle N[k]={\frac {f_{\text{s}}}{\delta f_{k}}}={\frac {f_{\text{s}}}{f_{k}}}Q.}
Since fs /fk is the number of samples processed per cycle at frequency fk , Q is the number of integer cycles processed at this central frequency.

The equivalent transform kernel can be found by using the following substitutions:

  • The window length of each bin is now a function of the bin number:
N = N [ k ] = Q f s f k . {\displaystyle N=N[k]=Q{\frac {f_{\text{s}}}{f_{k}}}.}
  • The relative power of each bin will decrease at higher frequencies, as these sum over fewer terms. To compensate for this, we normalize by N[k].
  • Any windowing function will be a function of window length, and likewise a function of window number. For example, the equivalent Hamming window would be
W [ k , n ] = α ( 1 α ) cos 2 π n N [ k ] 1 , α = 25 / 46 , 0 n N [ k ] 1. {\displaystyle W[k,n]=\alpha -(1-\alpha )\cos {\frac {2\pi n}{N[k]-1}},\quad \alpha =25/46,\quad 0\leqslant n\leqslant N[k]-1.}

After these modifications, we are left with

X [ k ] = 1 N [ k ] n = 0 N [ k ] 1 W [ k , n ] x [ n ] e j 2 π Q n N [ k ] . {\displaystyle X[k]={\frac {1}{N[k]}}\sum _{n=0}^{N[k]-1}W[k,n]x[n]e^{\frac {-j2\pi Qn}{N[k]}}.}

Variable-Q bandwidth calculation [edit]

The variable-Q transform is the same as constant-Q transform, but the only difference is the filter Q is variable, hence the name variable-Q transform. The variable-Q transform is useful where time resolution on low frequencies is important. There are ways to calculate the bandwidth of the VQT, one of them using equivalent rectangular bandwidth as a value for VQT bin's bandwidth.[3]

The simplest way to implement a variable-Q transform is add a bandwidth offset called γ like this one:[ citation needed ]

δ f k = ( 2 f k + γ ) Q . {\displaystyle \delta f_{k}=\left({\frac {2}{f_{k}+\gamma }}\right)Q.}

Alternatively, the bandwidth δf k for variable-Q transform with arbitrary frequency resolution scale[ citation needed ] can be evaluated using nonlinear number-rate scale below:

  • The variable-Q transform with Mel scale is used below:
f ( x ) = 2595 log 10 ( 1 + x 700 ) g ( x ) = 700 ( 10 x 2595 1 ) {\displaystyle {\begin{array}{lcl}f(x)=2595\log _{10}\left(1+{\frac {x}{700}}\right)\\g(x)=700\left(10^{\frac {x}{2595}}-1\right)\end{array}}}
where the f(x) denotes the frequency scale and g(x) denotes the corresponding inverses.
  • Calculate the slo and shi using this equation below:
s l o = f ( tresAtHz 500 t r e s ) s h i = f ( tresAtHz + 500 t r e s ) {\displaystyle {\begin{array}{lcl}\mathrm {slo} =\operatorname {f} ({\text{tresAtHz}}-{\frac {500}{tres}})\\\mathrm {shi} =\operatorname {f} ({\text{tresAtHz}}+{\frac {500}{tres}})\end{array}}}
where tresAtHz is a center frequency for the desired time resolution and tres is a desired time resolution at the specified frequency.
  • Then calculate the lower and higher boundaries using this equation below:
l o w e r B o u n d ( H z ) = g ( ( f ( H z ) s l o s h i s l o 0.5 ) ( s h i s l o ) + s l o ) h i g h e r B o u n d ( H z ) = g ( ( f ( H z ) s l o s h i s l o + 0.5 ) ( s h i s l o ) + s l o ) {\displaystyle {\begin{aligned}&{}\mathrm {lowerBound} (Hz)=\operatorname {g} \left(\left({\frac {\operatorname {f} (Hz)-\mathrm {slo} }{\mathrm {shi} -\mathrm {slo} }}-0.5\right)\cdot \left(\mathrm {shi} -\mathrm {slo} \right)+\mathrm {slo} \right)\\&{}\mathrm {higherBound} (Hz)=\operatorname {g} \left(\left({\frac {\operatorname {f} (Hz)-\mathrm {slo} }{\mathrm {shi} -\mathrm {slo} }}+0.5\right)\cdot \left(\mathrm {shi} -\mathrm {slo} \right)+\mathrm {slo} \right)\end{aligned}}}
  • Finally, calculate the bandwidth by this equation below:
δ f k = h i g h e r B o u n d ( H z ) l o w e r B o u n d ( H z ) {\displaystyle \delta f_{k}=\mathrm {higherBound} (Hz)-\mathrm {lowerBound} (Hz)}
where Hz is the center frequency of the VQT.

Another alternative way to calculate VQT bandwidth is use the frequency band's higher boundary then subtract it using lower boundary like this:[ citation needed ]

δ f k = h i g h e r B o u n d ( H z ) l o w e r B o u n d ( H z ) {\displaystyle \delta f_{k}=\mathrm {higherBound} (Hz)-\mathrm {lowerBound} (Hz)}

This is especially useful when CQT/VQT bins are neither linearly nor logarithmically-spaced, such as Mel, hyperbolic sine and other nonlinear scaling.[ citation needed ]

Fast calculation [edit]

The direct calculation of the constant-Q transform is slow when compared against the fast Fourier transform (FFT). However, the FFT can itself be employed, in conjunction with the use of a kernel, to perform the equivalent calculation but much faster.[4] An approximate inverse to such an implementation was proposed in 2006; it works by going back to the DFT, and is only suitable for pitch instruments.[5]

A development on this method with improved invertibility involves performing CQT (via FFT) octave-by-octave, using lowpass filtered and downsampled results for consecutively lower pitches.[6] Implementations of this method include the MATLAB implementation and LibROSA's Python implementation.[7] LibROSA combines the subsampled method with the direct FFT method (which it dubs "pseudo-CQT") by having the latter process higher frequencies as a whole.[7]

The sliding DFT can be used for faster calculation of constant-Q transform, since the sliding DFT does not have to be linear-frequency spacing and same window size per bin.[8]

Comparison with the Fourier transform [edit]

In general, the transform is well suited to musical data, and this can be seen in some of its advantages compared to the fast Fourier transform. As the output of the transform is effectively amplitude/phase against log frequency, fewer frequency bins are required to cover a given range effectively, and this proves useful where frequencies span several octaves. As the range of human hearing covers approximately ten octaves from 20 Hz to around 20 kHz, this reduction in output data is significant.

The transform exhibits a reduction in frequency resolution with higher frequency bins, which is desirable for auditory applications. The transform mirrors the human auditory system, whereby at lower-frequencies spectral resolution is better, whereas temporal resolution improves at higher frequencies. At the bottom of the piano scale (about 30 Hz), a difference of 1 semitone is a difference of approximately 1.5 Hz, whereas at the top of the musical scale (about 5 kHz), a difference of 1 semitone is a difference of approximately 200 Hz.[9] So for musical data the exponential frequency resolution of constant-Q transform is ideal.

In addition, the harmonics of musical notes form a pattern characteristic of the timbre of the instrument in this transform. Assuming the same relative strengths of each harmonic, as the fundamental frequency changes, the relative position of these harmonics remains constant. This can make identification of instruments much easier. The constant Q transform can also be used for automatic recognition of musical keys based on accumulated chroma content.[10]

Relative to the Fourier transform, implementation of this transform is more tricky. This is due to the varying number of samples used in the calculation of each frequency bin, which also affects the length of any windowing function implemented.[11]

Also note that because the frequency scale is logarithmic, there is no true zero-frequency / DC term present, which may be a drawback in applications that are interested in the DC term. Although for applications that are not interested in the DC such as audio, this is not a drawback.

References [edit]

  1. ^ Judith C. Brown, Calculation of a constant Q spectral transform, J. Acoust. Soc. Am., 89(1):425–434, 1991.
  2. ^ Continuous Wavelet Transform "When the mother wavelet can be interpreted as a windowed sinusoid (such as the Morlet wavelet), the wavelet transform can be interpreted as a constant-Q Fourier transform. Before the theory of wavelets, constant-Q Fourier transforms (such as obtained from a classic third-octave filter bank) were not easy to invert, because the basis signals were not orthogonal."
  3. ^ Cwitkowitz, Frank C.Jr (2019). "End-to-End Music Transcription Using Fine-Tuned Variable-Q Filterbanks" (PDF). Rochester Institute of Technology: 32–34. Retrieved 2022-08-21 .
  4. ^ Judith C. Brown and Miller S. Puckette, An efficient algorithm for the calculation of a constant Q transform, J. Acoust. Soc. Am., 92(5):2698–2701, 1992.
  5. ^ FitzGerald, Derry; Cychowski, Marcin T.; Cranitch, Matt (1 May 2006). "Towards an Inverse Constant Q Transform". Audio Engineering Society Convention. Paris: Audio Engineering Society. 120.
  6. ^ Schörkhuber, Christian; Klapuri, Anssi (2010). "CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING". 7th Sound and Music Computing Conference. Barcelona. Retrieved 12 December 2018. paper
  7. ^ a b McFee, Brian; Battenberg, Eric; Lostanlen, Vincent; Thomé, Carl (12 December 2018). "librosa: core/constantq.py at 8d26423". GitHub. librosa. Retrieved 12 December 2018.
  8. ^ Bradford, R, ffitch, J & Dobson, R 2008, Sliding with a constant-Q, in 11th International Conference on Digital Audio Effects (DAFx-08) Proceedings September 1-4th, 2008 Espoo, Finland . DAFx, Espoo, Finland, pp. 363-369, Proc. of the Int. Conf. on Digital Audio Effects (DAFx-08), 1/09/08.
  9. ^ http://newt.phys.unsw.edu.au/jw/graphics/notes.GIF[ bare URL image file ]
  10. ^ Hendrik Purwins, Benjamin Blankertz and Klaus Obermayer, A New Method for Tracking Modulations in Tonal Music in Audio Data Format, International Joint Conference on Neural Network (IJCNN'00)., 6:270-275, 2000.
  11. ^ Benjamin Blankertz, The Constant Q Transform, 1999.

hatchlinnot1984.blogspot.com

Source: https://en.wikipedia.org/wiki/Constant-Q_transform

0 Response to "Continuous Fourier Equal to Constant Value"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel