corrcoef

Correlation coefficients.

Library

EngeeDSP

Syntax

Function call

R,P,RL,RU = corrcoef(A) — returns the matrix R correlation coefficients for the matrix A, where are the columns of the matrix A They are random variables, and the strings are measurements.

It also returns a matrix of p-values. P to test the hypothesis that there is no connection between the observed phenomena (null hypothesis). If the off-diagonal element of the matrix P less than the significance level (by default 0.05), then the corresponding correlation in R it is considered significant.

It also returns matrices RL and RU containing lower and upper bounds 95% confidence interval for each coefficient.

Arguments P, RL and RU they are returned only if R It does not contain complex elements.

_ = corrcoef(A,B) — Returns coefficients between two random variables A and B.

_ = corrcoef(_, Name,Value) — sets additional options for any of the previous syntaxes with one or more arguments Name,Value.

Arguments

Input arguments

# A — input array

+ the matrix

Details

The input array, specified as a matrix.

If A is a scalar, then the function corrcoef(A) returns NaN.
If A is a vector, then a function corrcoef(A) returns 1.

Data types	`Float32`, `Float64`
Support for complex numbers	Yes

# B — additional input array

+ vector | the matrix | multidimensional array

Details

An additional input array specified as a vector, matrix, or multidimensional array.

A and B they must be the same size.
If A and B — scalars, then corrcoef(A,B) returns 1. However, if A and B are equal, then corrcoef(A,B) returns NaN.
If A and B — matrices or multidimensional arrays, then corrcoef(A,B) converts each input argument into its vector representation.
If A and B — empty arrays 0×0 Then corrcoef(A,B) returns the matrix 2×2 with values NaN.

Data types	`Float32`, `Float64`
Support for complex numbers	Yes

Name-value input arguments

Specify optional argument pairs in the format Name, Value, where Name — the name of the argument, and Value — the appropriate value. Name-value arguments should be placed after other arguments, but the order of the pairs does not matter.

Use commas to separate the name and value, and Name put it in quotation marks.

Example: R = corrcoef(A, "Alpha", 0.01).

# Alpha — significance level

+ 0.05 (by default) | scalar in the range (0, 1)

Details

The significance level, set by a number from 0 before 1. The value of the argument Alpha determines the level of significance as a percentage, 100*(1−Alpha)%, for the correlation coefficients, which defines the boundaries in RL and RU.

Data types

Float32, Float64

# Rows — using values NaN
"all" (by default) | "complete" | "pairwise"

Details

Using values NaN, set to one of the following values:

"all" — enable all values NaN in the input data before calculating the correlation coefficients.
"complete" — skip all input data lines containing values NaN, before calculating the correlation coefficients. This option always returns a positively semi-definite matrix.
"pairwise" — skip all lines containing NaN, only in pairs for each calculation of the correlation coefficient in two columns. This option can return a matrix that is not positively semi-definite.

Output arguments

# R — correlation coefficients

+ the matrix

Details

Correlation coefficients returned as a matrix.

For a single input matrix, the matrix R has a size of [size(A,2) size(A,2)] based on the number of random variables (columns) represented by the matrix A. The diagonal elements are by convention equal to one, and the off-diagonal elements represent the correlation coefficients of pairs of variables. The coefficient values can vary from −1 before 1, where −1 means a direct negative correlation, 0 — lack of correlation, and 1 — direct positive correlation. The matrix R symmetrical.
For two input matrices, R It represents a matrix 2×2 with units on the diagonal and correlation coefficients off the diagonal.
If any random variable is a constant, its correlation with all other variables is undefined, and the corresponding row and column value is NaN.

# P — p-values

+ the matrix

Details

P-values returned as a matrix. The matrix P symmetrical and has the same size as R. All the elements on the diagonal are units, and the elements outside the diagonal are p-values for each pair of variables. The P-values range from 0 before 1, where values close to 0, correspond to a significant correlation in R and a low probability of confirming the null hypothesis.

# RL is the lower bound for the correlation coefficient

+ the matrix

Details

The lower bound of the correlation coefficient, returned as a matrix. The matrix RL symmetrical and has the same size as R. All diagonal elements are units, and non—diagonal elements represent the lower boundary. 95% of the confidence interval for the corresponding coefficient in R. Argument RL will not be refunded if R contains complex values.

# RU — upper bound for the correlation coefficient

+ the matrix

Details

The upper bound of the correlation coefficient, returned as a matrix. The matrix RU symmetrical and has the same size as R. All diagonal elements are units, and non—diagonal elements represent the upper bound. 95% of the confidence interval for the corresponding coefficient in R. Argument RU will not be refunded if R contains complex values.

Examples

Random columns of the matrix

Details

Let’s calculate the correlation coefficients for a matrix with two normally distributed random columns and one column defined through the other. Because the third column of the matrix A is a multiple of the second, these two variables are directly correlated, hence the correlation coefficient in the elements (2,3) and (3,2) matrices R equal to 1.

import EngeeDSP.Functions: corrcoef, randn

x = randn(6,1)
y = randn(6,1)
A = [x y 2*y .+ 3]
R = corrcoef(A)[1]

3×3 Matrix{Float64}:
  1.0       -0.322277  -0.322277
 -0.322277   1.0        1.0
 -0.322277   1.0        1.0

Two random variables

Details

Let’s calculate the matrix of correlation coefficients between two normally distributed random vectors, each of which contains 10 measurements.

import EngeeDSP.Functions: corrcoef, randn

A = randn(10,1)
B = randn(10,1)
R = corrcoef(A,B)[1]

2×2 Matrix{Float64}:
 1.0       0.193892
 0.193892  1.0

Matrices of p-values, upper and lower bounds of the confidence interval

Details

Let’s calculate the correlation coefficients and the p-values of a normally distributed random matrix with the added fourth column equal to the sum of the values of the other three columns. Since the last column of the matrix is A It is a linear combination of the others, and there is a correlation between the fourth variable and each of the other three variables. Hence, the fourth row and the fourth column of the matrix P They contain very small p-values, which indicates the presence of significant correlations.

import EngeeDSP.Functions: corrcoef, randn

A = randn(50,3)
A = hcat(A, sum(A, dims=2))
R,P,RU,RL=corrcoef(A)

Output the matrix R.

print("R:")
R

R:
4×4 Matrix{Float64}:
  1.0        -0.0191831  -0.0562013  0.512678
 -0.0191831   1.0        -0.252374   0.497031
 -0.0562013  -0.252374    1.0        0.511839
  0.512678    0.497031    0.511839   1.0

Output the matrix P p-values.

print("P:")
P

P:
4×4 Matrix{Float64}:
 1.0          0.894805     0.698269     0.000140925
 0.894805     1.0          0.0770341    0.000240879
 0.698269     0.0770341    1.0          0.000145133
 0.000140925  0.000240879  0.000145133  1.0

Let’s output the matrices RL and RU the lower and upper limits of the coefficients.

print("RL:")
RL

RL:
4×4 Matrix{Float64}:
  1.0       -0.295951  -0.329396  0.273336
 -0.295951   1.0       -0.495887  0.253795
 -0.329396  -0.495887   1.0       0.272283
  0.273336   0.253795   0.272283  1.0

print("RU:")
RU

RU:
4×4 Matrix{Float64}:
 1.0       0.260556   0.225677   0.692241
 0.260556  1.0        0.0279365  0.681144
 0.225677  0.0279365  1.0        0.691648
 0.692241  0.681144   0.691648   1.0

NaN values

Details

Let’s create a normally distributed matrix containing the values NaN, and calculate the matrix of correlation coefficients by excluding all rows containing NaN.

import EngeeDSP.Functions: corrcoef, randn

A = randn(5, 3)
A[1, 3] = NaN
A[3, 2] = NaN
A

5×3 Matrix{Float64}:
  0.194551     1.40891   NaN
  0.279785    -0.534099   -0.792337
  0.0512203  NaN          -0.952975
 -0.774466    -0.176248    0.353905
  0.786782    -0.24375     1.59703

R = corrcoef(A,"Rows","complete")[1]

3×3 Matrix{Float64}:
  1.0       -0.369186  0.340384
 -0.369186   1.0       0.748195
  0.340384   0.748195  1.0

Using the option "all" to include all values NaN in the calculation.

R = corrcoef(A,"Rows","all")[1]

3×3 Matrix{Float64}:
   1.0  NaN  NaN
 NaN    NaN  NaN
 NaN    NaN  NaN

Using the option "pairwise" for pairwise calculation of the correlation coefficient for each column. If one of the columns contains the value NaN this line will be skipped.

R = corrcoef(A,"Rows","pairwise")[1]

3×3 Matrix{Float64}:
 1.0         0.00819072  0.300542
 0.00819072  1.0         0.748195
 0.300542    0.748195    1.0

Additional Info

Correlation coefficient

Details

The correlation coefficient of two random variables is a measure of their linear relationship. If each variable has for scalar measurements, the Pearson correlation coefficient is defined as

where

and — average and standard deviation ;
and — average and standard deviation .

Alternatively, the correlation coefficient can be determined using covariance. and :

A matrix of correlation coefficients of two random variables is a matrix of correlation coefficients for each pairwise combination of variables.:

Because and They always correlate directly with themselves, the elements are diagonally equal. 1 That is ,

Literature

Fisher R.A., Statistical Methods for Research Workers, 13th Ed., Hafner, 1958.
Kendall M.G., The Advanced Theory of Statistics, 4th Ed., Macmillan, 1979.
Press W.H., Teukolsky S.A., Vetterling W.T., and Flannery B.P., Numerical Recipes in C, 2nd Ed., Cambridge University Press, 1992.