LPC Speech analysis and synthesis¶

This example shows how to implement a speech compression method using the EngeeDSP library functionality. This example implements linear prediction coding (LPC), which is a technique used primarily in audio and speech processing to represent the spectral envelope of a digital speech signal in a compressed form using linear prediction model information. The model implements LPC analysis and synthesis of the speech signal.

In the analysis section, the reflection coefficients are extracted from the signal and used to calculate the residual signal.

In the "synthesis" section, the signal is reconstructed using the residual signal and reflection coefficients.

The residual signal and reflection coefficients require fewer bits to encode than the original speech signal.

Connecting libraries, declaring input data¶

Connecting libraries

Pkg.add(["WAV"])

using .EngeeDSP;
using  Plots;
plotlyjs();
using WAV;
using Base64;

Declaring variables

sample = 80;
var_load = load_audio();
step = EngeeDSP.step;
In1 = step( var_load, "$(@__DIR__)/check_signal.wav", sample );

Declaring LPC data structures and functions¶

In this model, the speech signal is divided into frames of size 80 samples with an overlap of 160 samples.

Each frame is processed by a Hamming window.

Tenth order autocorrelation coefficients are found and then from the autocorrelation coefficients the reflection coefficients are calculated using Levinson-Durbin algorithm. The original speech signal is passed through an analysing filter, which is a null filter with coefficients equal to the reflection coefficients obtained above. The output of the filter is the residual signal. This residual signal passes through the synthesis filter, which is the inverse of the analysis filter.

The output of the synthesis filter is the original signal.

mutable struct LPCAnalysisAndSynthesisOfSpeech
    obj_Pre_Emphasis
    obj_Overlap_Analysis_Windows
    obj_Window
    obj_Autocorrelation
    obj_Levinson_Durbin
    obj_Time_Varying_Analysis_Filter
    obj_Pad
    obj_FFT
    obj_MathFunction1
    obj_RC_To_InvSine
    obj_5_bit_Quantizer
    obj_6_bit_Quantizer
    obj_Inv_Sine_to_RC
    obj_Time_Varying_Synthesis_Filter
    obj_De_emphasis_Filter
    function LPCAnalysisAndSynthesisOfSpeech()
        new(
        DescretFIRFilter("Dialog parameters","Direct form",[1, -.95],"Columns as channels",0,false,"None"),
        Buffer(160,80,0),
        WindowFunction("Apply window to input","Hamming","Symmetric"),
        Autocorrelation("Biased",10),
        LevinsonDurbin("A and K",false,true),
        DescretFIRFilter("Input port","Lattice MA","Columns as channels",0,false,"None"),
        EngeeDSP.Pad("Columns","Specify via dialog",0,"User-specified",256,"End","None"),
        EngeeFFT("Auto",false,false,true),
        MathFunction("reciprocal","Exact","auto"),
        TrigonometricFunction("asin","auto",false),
        Quantizer(0.1,false),
        Quantizer(0.03125,false),
        TrigonometricFunction("sin","None","auto"),
        AllpoleFilter("Input port","Lattice AR","Columns as channels",0),
        AllpoleFilter("Dialog parameters","Direct form",[1 , -.95],"Columns as channels",0)
        )
    end
end

Adjusting the LPC parameters.

function setup(obj::LPCAnalysisAndSynthesisOfSpeech,In1)
    Pre_Emphasis_set = EngeeDSP.setup(obj.obj_Pre_Emphasis,In1) 
    Overlap_Analysis_Windows_set = EngeeDSP.setup(obj.obj_Overlap_Analysis_Windows,Pre_Emphasis_set);
    Window_set = EngeeDSP.setup(obj.obj_Window,Overlap_Analysis_Windows_set);
    Autocorrelation_set = EngeeDSP.setup(obj.obj_Autocorrelation,Window_set);
    Levinson_Durbin_set = EngeeDSP.setup(obj.obj_Levinson_Durbin,Autocorrelation_set);
    Time_Varying_Analysis_Filter_set = EngeeDSP.setup(obj.obj_Time_Varying_Analysis_Filter,Pre_Emphasis_set,Levinson_Durbin_set[2]);
    RC_To_InvSine_set = EngeeDSP.setup(obj.obj_RC_To_InvSine,Levinson_Durbin_set[2]);
    bit_Quantizer_5_set = EngeeDSP.setup(obj.obj_5_bit_Quantizer,RC_To_InvSine_set);
    bit_Quantizer_6_set = EngeeDSP.setup(obj.obj_6_bit_Quantizer,Time_Varying_Analysis_Filter_set);
    Inv_Sine_to_RC_set = EngeeDSP.setup(obj.obj_Inv_Sine_to_RC,bit_Quantizer_5_set);
    Time_Varying_Synthesis_Filter_set = EngeeDSP.setup(obj.obj_Time_Varying_Synthesis_Filter,bit_Quantizer_6_set,Inv_Sine_to_RC_set);
    De_emphasis_Filter_set = EngeeDSP.setup(obj.obj_De_emphasis_Filter,Time_Varying_Synthesis_Filter_set);
end

setup (generic function with 1 method)

Declares the settings for the first processing step.

function step1(obj::LPCAnalysisAndSynthesisOfSpeech,In1)
    Pre_Emphasis_out = EngeeDSP.step(obj.obj_Pre_Emphasis,In1); 
    Overlap_Analysis_Windows_out = EngeeDSP.step(obj.obj_Overlap_Analysis_Windows,Pre_Emphasis_out); 
    Window_out = EngeeDSP.step(obj.obj_Window,Overlap_Analysis_Windows_out); 
    Autocorrelation_out = EngeeDSP.step(obj.obj_Autocorrelation,Window_out); 
    Levinson_Durbin_out = EngeeDSP.step(obj.obj_Levinson_Durbin,Autocorrelation_out);
    Time_Varying_Analysis_Filter_out = EngeeDSP.step(obj.obj_Time_Varying_Analysis_Filter,Pre_Emphasis_out,Levinson_Durbin_out[2]);
    Pad_out = EngeeDSP.step(obj.obj_Pad,Levinson_Durbin_out[1]);
    FFT_out = EngeeDSP.step(obj.obj_FFT,Pad_out);
    MathFunction1_out = EngeeDSP.step(obj.obj_MathFunction1,FFT_out);
    RC_To_InvSine_out = EngeeDSP.step(obj.obj_RC_To_InvSine,Levinson_Durbin_out[2]);
    bit_Quantizer_5_out = EngeeDSP.step(obj.obj_5_bit_Quantizer,RC_To_InvSine_out);
    bit_Quantizer_6_out = EngeeDSP.step(obj.obj_6_bit_Quantizer,Time_Varying_Analysis_Filter_out); 
    Inv_Sine_to_RC_out = EngeeDSP.step(obj.obj_Inv_Sine_to_RC,bit_Quantizer_5_out);
    Time_Varying_Synthesis_Filter_out = EngeeDSP.step(obj.obj_Time_Varying_Synthesis_Filter,bit_Quantizer_6_out,Inv_Sine_to_RC_out);
    De_emphasis_Filter_out = EngeeDSP.step(obj.obj_De_emphasis_Filter,Time_Varying_Synthesis_Filter_out);
    De_emphasis_Filter_out,MathFunction1_out;
end

step1 (generic function with 1 method)

Implementation of the LPC algorithm¶

Function call and processing of EQ output.

obj = LPCAnalysisAndSynthesisOfSpeech() 
setup(obj,In1[1]);
Out_a = zeros(size(vcat(In1...)))
Out_p = In1.*1im
size(In1,1)

for i = 0:size(In1,1) - 2

    output_a = step1(obj,In1[i+1]) 
    Out_a[sample*i + 1 : sample*(i + 1)] = output_a[1]
    
    output_p  = step1(obj,In1[i+1]) 
    Out_p[i+1] = output_p[2]
end

Processing and analysing the results¶

Setting up the player.

In2 = vcat(In1...);
function audioplayer(s, fs);
  buf = IOBuffer();
  wavwrite(s, buf; Fs=fs);
  data = base64encode(unsafe_string(pointer(buf.data), buf.size));
  markup = """<audio controls="controls" {autoplay}>
              <source src="data:audio/wav;base64,$data" type="audio/wav" />
              Your browser does not support the audio element.
              </audio>"""
  display("text/html", markup);
  end

audioplayer (generic function with 1 method)

Playing back the original audio file

audioplayer(In2, 8000)

Playing back an encrypted audio file.

audioplayer(Out_a, 8000)

Graph of visual comparison of original and encrypted files.

  plot(In2)
  plot!(Out_a)

Spectral analyser graph output.

u1 = Out_p[1]; u2 = Out_p[2];
uSubset1 = u1[1:floor(Int,length(u1)/2+1)];
uSubset2 = u2[1:floor(Int,length(u2)/2+1)];
y1 = 6.02059991327962 .* log2.(abs.(uSubset1)+hypot.(eps.(real(uSubset1)),eps.(imag(uSubset1))));
y2 = 6.02059991327962 .* log2.(abs.(uSubset2)+hypot.(eps.(real(uSubset2)),eps.(imag(uSubset2))));

plot([1:129], y1[:], fillrange = -20, fillalpha = 0.35, c = 1, ylabel = "Mag^2", legend = false)
a = plot!([1:129],y1[:], msw = 0, ms = 2.5, xlabel = "Frequency")
plot([1:129], y2[:], fillrange = -20, fillalpha = 0.35, c = 1, ylabel = "Mag^2", legend = false)
b = plot!([1:129],y2[:], msw = 0, ms = 2.5, xlabel = "Frequency")
plot!(a,b)

Conclusion¶

As a result of this demonstration we have learnt how to interact with functions from EngeeDSP library, and also demonstrated the possibility of creating interactive blocks inside scripts. This example clearly demonstrates the work of LPC coding method and allows us to evaluate its efficiency. This is one of the most powerful speech analysis methods and one of the most useful methods for coding good quality speech at low bit rates, providing highly accurate estimates of speech parameters.