|Which one is the most adequate definition of audio signal processing? It is a field of engineering of relevance for audio signals It is a field concerned with the intentional alteration of sounds using electronic means
It is a collection of methods that can be used for filtering sound and music signals
It is a field dedicated to the processing of analog audio signals.
What is Audacity? It is a free and open-source digital audio editor and recording application software It is a software library that includes many algorithms for audio processing It the the collection of python functions that we will use in class It is a commercial software application for audio signal processing.
Which one of these is the equation that expresses the convolution operation? a. b. c. d.
Which one of these is an equation for a discrete complex sinusoid? x[n] = j A sin(ωnT + ø) x[n] = A cos(ωnT + ø) + j A sin(ωnT + ø) x[n] = A cos(ω) + j A sin(ω) x(t) = A cos(ωT + ø) + j A sin(ωT + ø).
Which of these is the standard equation of the DFT? a. b. c. d.
How could you describe the DFT operation? As the conversion of a real signal into its corresponding complex function As the convolution of two complex signals As the scalar product of a signal x[n] by a collection of complex sinusoids As the sum of the product of a signal x[n] by a collection of complex sinusoids.
Given a signal x[n] = exp( (j2πk0n) /N), for n = 0,...., N-1, its DFT X[k] is N for all k 0 for k = k0 and N for k ≠ k0 1 for k = k0 and 0 for k ≠ k0 N for k = k0 and 0 for k ≠ k0.
The DFT of a signal x[n] = exp(j 2 π f0 n), in which f0 is any frequency in Hz Has two magnitude values, one for f0 and another for -f0 has only a positive magnitude for the value of k closest to f0 has positive magnitude values in the whole spectrum
cannot be computed because f0 is not one of the sinusoids computed by the DFT.
The shift property of the FT can be expressed by x[n-n0] <-> exp(-j 2 π k n0 / N) X[k], and it means that A frequency shift in the frequency domain corresponds to a multiplication by a complex exponential in the time domain Delaying a signal x corresponds to a frequency shifting in the frequency domain The DFT is a time invariant function Delaying a time domain signal does not affect the corresponding phase spectrum of the signal.
The spectrum of a real signal has the following symmetry Both the magnitude and the phase spectra have odd symmetry The magnitude spectrum is even and the phase spectrum is odd The magnitude spectrum is even and the phase spectrum is a multiple of pi The real part of the spectrum of odd and the imaginary part even.
The property of zero-padding can be explained as Interpolation and zero-padding are the same thing in the DFT context By adding zeros to the end of a signal x, the corresponding spectrum will be an interpolated version of X zero-padding means to add zeros in between every sample of a signal Zero-padding is what we do when we compute the FFT and thus the spectrum is always interpolated.
Phase unwrapping of a phase spectrum is is a way to make sure that the phase does not go outside the -pi to pi range is a process by which we make sure that the phase spectrum does not distort is a function in python that computes the phase spectrum a process in which we get rid of phase discontinuities, thus obtaining a smooth function.
From the equation of the STFT, which of these statements are right (select all the one which are right) H is the number of samples to be advanced from frame to frame The output of the STFT is a single complex function that captures the variability of a whole sound w[n] is a function used to select the part of the signal x[n] to analyze l is the index that indicates the sample to be analyzed.
Which are the measures we use to describe an analysis window? Main-lobe width and highest side-lobe level Width of first side-lobe and amplitude of second side-lobe Product of the width of the main lobe by the one of the second lobe Overall amplitude and frequency resolution.
Which of these statements are right? To get a zero phase window we should use an odd size window When choosing the window size we have to consider the time-frequency trade-off The FFT size, N, has to be bigger or equal than the window size, M. The analysis/synthesis of the STFT will always be an identity.
The hop-size, H, of the STFT Should to be bigger than the window size, M Should to be equal to the window size, M Should to be equal to the FFT size, N Ideally should be chosen to to get a constant when overlapping and adding the windows.
From the equation of the Sinusoidal Model, which of the following statements are true? The amplitude and frequencies of the sinusoids change at every frame The amplitude and frequencies of the sinusoids change at every sample The number of sinusoids is equal to the DFT size The amplitude and frequencies of the sinusoids are constant functions.
The window size is determinated by the following equation. Which is these statements are correct? Bs is the highest side-lobe level of the window The window size is directly proportional to the frequencies of the sinusoids to be analyzed The size of the window has to be equal to the size of the DFT The size of the window has to allow to separate the main-lobes of the sinusoids we want to analyze.
A peak in the spectrum, pr = mX[k0], is defined by mX[k0 - 1] < mX[k0] > mX[k0 + 1], which of these statements are correct? The location of the peak is a perfect measure of the frequency of the sinusoid A peak is a local maxima in the magnitude spectrum A peak can be computed both from the magnitude or the phase spectrum Zero-padding before computing the spectrum improves the measure of the peak values.
Once we have measure the sinusidal values of a sound we can reconstruct the sound using additive synthesis. Additive synthesis can be implemented by Performing substractive synthesis form the sinusoidal values obtained Using sinusoidal oscillators and summing all their outputs Generating a spectrum with the values of the sinusoids and performing an IFFT It is not possible to recover the original sound from the sinusoidal analysis performed.
Consider the harmonic model equation. Which of these statements are true. The frequency values of the cosines are multiples of f0. r is a real number expressing the frequency of the harmonics. r is an integer number corresponding to the harmonic number. The output signal yh will always be equal to the sound analyzed.
Which statements are true when talking about sinusoids/partials/harmonics? A Partial is a synonymous of sinusoid. A sinusoid is a mathematical function used to model partials and harmonics. A Harmonic is a partial which is a multiple of the fundamental frequency. A harmonic sound is always periodic.
A fundamental frequency of a sound can be defined as The common divisor of the harmonic series that best explains the spectral peaks of a sound. The Least Common Multiple of the peaks of a sound. The lowest frequency present in a harmonic sound. The highest spectral peak in the magnitude spectrum of a harmonic sound.
The Autocorrelation function can be used to Convolve one sound with another. Measure the period length of of a harmonic sound. Compute the spectral peaks of a sound. Measure the values of the harmonics of a sound.
The power spectral density of a signal is defined by the following equation. In the context of the stochastic signals, which statements are correct: The power spectral density is the same than computing the autocorrelation function of the input sound x[n] X[k] is the spectrum computed from windowed portion of the input sound x[n]. We have to compute the magnitude spectrum of an infinitely long sound. The power spectral density has to estimated from a single frame of a signal.
The stochastic model we use is defined by the following. Which of these statements are correct? This expression is the filtering of a white noise signal. u[n] is the input signal that we want to model. h[h] is the frequency response of white noise. This is the convolution between a white noise signal and the impulse response of a filter approximation the sound to be modeled.
The harmonic plus stochastic model (HpS) is defined by the equation. Which of these statements are correct? yst[n] is the residual signal obtained after subtracting the harmonics of the sound yst[n] is the stochastic approximation of the resiudal signal. The sinusoids of the model can have any frequency. y[n] will never be identical to the signal to be modeled, x[n].
Which of the following statements are correct? What is most important in modelling any sound is that the synthesized sound is identical to the input sound. The stochastic approximation implemented in the sms-tools software uses the Linear Prediction Coding method. In the analysis we have to compute two different FFTs, one for the sinusoidal analysis an another for obtaining the residual. We will have to choose the model and the analysis parameters depending on the sound and the application.
How is information encoded in the speech waveform? As the speech generation process advances from the text formulation until the acoustic waveform, the information rate is estimated to keep constant. As the speech generation process advances from the text formulation until the acoustic waveform, the information rate is estimated to increase. As the speech generation process advances from the text formulation until the acoustic waveform, the information rate is estimated to decrease. .
Select the correct statement on phonation: Pitch and f0 have a linear relationship. The only factor that affects phonation is the overpressure of the air in the lungs. The oscillation frequency or f0 is determined in the vocal tract. Air coming from the lungs makes a pressure difference that opens the vocal folds.
Select the correct statement on articulators and the vocal tract: The jaw and the lips are the only articulators. Articulators modify the sound coming from the vocal cords. Waves in the vocal tract will resonate if their wavelength does not correspond to dimensions of tube.
Select the correct statement on formants: Formants are multiples of the F0 frequency. The first and fourth formant are enough to determine vowels. Formants are defined as the spectra peaks of the sound spectrum envelope. Formants are created by the pass of the sound through the vocal cords.
Choose the correct statement: In narrowband spectral analysis, short waveform sections are analyzed Wideband spectral analysis offers high frequency resolution In wideband spectral analysis, short waveform sections are analyzed
Narrowband spectral analysis offers high time resolution.
Choose the correct sentence on the human speech perception: In frequency masking, different tones fire different cochlear filters along the basilar membrane Temporal masking always occurs simultaneously in time Perception of loudness is frequency-dependent Critical bands represent the uniform frequency analysis of the basilar membrane.
Choose the correct difference between speech and singing: Singing voice has shorter breathing phases Singing voice may have problems with intelligibility due to the pitch range Speech has larger loudness variations Speech voice has a larger pitch range.
Choose the correct statement on LPC: The more coefficients used in the LPC filter, the less its spectral envelope approximates to the Fourier spectrum Linear Predictive Coding (LPC) models the speech at time n as a linear combination of the previous p samples In LPC, the system is excited by an impulse train for unvoiced speech, or a random noise sequence for voiced speech LPC is only applied to synthesis of speech from LPC parameters.
What are the steps for cepstrum computation? IFFT + compute logarithm + FFT IFFT + compute exponential + FFT FFT + get spectrum amplitude + exponential + IFFT FFT + get spectrum amplitude + log + IFFT.
Choose the correct sentence on the cepstrum properties: The complex cepstrum operator transforms convolution into addition. It has finite duration. The complex spectrum decays at last as fast as |n|^2. The complex cepstrum operator transforms convolution into product.
Choose the correct sentence(s) on the cepstral analysis: The filter parameters usually reside in the lower quefrencies The excitation parameters have higher quefrencies The filter parameters usually reside in the higher quefrencies The excitation parameters have lower quefrencies.
What kind of segment does the ceptrum in the picture represent? Unvoiced segment Voiced segment.
In diphone synthesis: Databases are small compared to unit selection Intelligibility is low Multiple speakers are recorded Naturalness is good.
What are the expected qualities of a good Text-to-speech system? Either good naturalness or good intelligibility Good naturalness Good intelligibility Good intelligibility and good naturalness.
Select the correct sentence on text-to-speech technologies: The generation of prosody only deals with phone duration and loudness The main steps include text analysis, phonetic analysis, and prosodic analysis It is not possible to modify a phone's pitch without modifying it's duration Text normalization consists on converting raw text to its phonetic transcription.
In unit selectio-based synthesis: A single target cost function is used Databases contain just a few minutes of a single speaker A single join cost function is used Selected units can be larger than diphones.
Vocaloid is a singing voice synthesizer developed at the MTG whose fits version was based on The Sinusoidal plus Stochastic model The Harmonic plus Residual model The Harmonic plus Stochastic model The Sinusoidal plus Residual model.
Sound morphing can be accomplished by Using the Harmonic plus Stochastic Model and interpolating the analysis data of two different sounds Using the STFT and interpolating the magnitude spectra of two different sounds using the Sinusoidal model and time stretching the input sound to be much longer Using the Harmonic model and transposing the frequencies of the sinusoids.
Essentia is a software library developed at the MTG whose main use is The coding of audio signals in order to compress any sound or music signal The analysis of sounds in order to obtain audio features of use in sound and music description The implementation of source separation techniques for sound and music signals The synthesis of musical signals in order to implement real-time synthesizers.
AcousticBrainz is a project of the MTG whose major aim is to Identify and recommend music recordings Crowd source acoustic information from music recordings that are referenced in MusicBrainz Crowd source recordings of commercial audio recordings Collect the sounds of freesound that are musical songs.