corrcoef
Correlation coefficients.
| Library |
|
Syntax
Function call
-
R,P,RL,RU = corrcoef(A)— returns the matrixRcorrelation coefficients for the matrixA, where are the columns of the matrixAThey are random variables, and the strings are measurements.It also returns a matrix of p-values.
Pto test the hypothesis that there is no connection between the observed phenomena (null hypothesis). If the off-diagonal element of the matrixPless than the significance level (by default0.05), then the corresponding correlation inRit is considered significant.
-
_ = corrcoef(_, Name,Value)— sets additional options for any of the previous syntaxes with one or more argumentsName,Value.
Arguments
Input arguments
# A — input array
+
the matrix
Details
The input array, specified as a matrix.
-
If
Ais a scalar, then the functioncorrcoef(A)returnsNaN. -
If
Ais a vector, then a functioncorrcoef(A)returns1.
| Типы данных |
|
| Support for complex numbers |
Yes |
# B — additional input array
+
vector | the matrix | multidimensional array
Details
An additional input array specified as a vector, matrix, or multidimensional array.
-
AandBthey must be the same size. -
If
AandB— scalars, thencorrcoef(A,B)returns1. However, ifAandBare equal, thencorrcoef(A,B)returnsNaN. -
If
AandB— matrices or multidimensional arrays, thencorrcoef(A,B)converts each input argument into its vector representation. -
If
AandB— empty arrays0×0Thencorrcoef(A,B)returns the matrix2×2with valuesNaN.
| Типы данных |
|
| Support for complex numbers |
Yes |
Name-value input arguments
Specify optional argument pairs in the format Name, Value, where Name — the name of the argument, and Value — the appropriate value. Name-value arguments should be placed after other arguments, but the order of the pairs does not matter.
Use commas to separate the name and value, and Name put it in quotation marks.
Example: R = corrcoef(A, "Alpha", 0.01).
# Alpha — significance level
+
0.05 (by default) | scalar in the range (0, 1)
#
Rows —
using values NaN
"all" (by default) | "complete" | "pairwise"
Details
Using values NaN, set to one of the following values:
-
"all"— enable all valuesNaNin the input data before calculating the correlation coefficients. -
"complete"— skip all input data lines containing valuesNaN, before calculating the correlation coefficients. This option always returns a positively semi-definite matrix. -
"pairwise"— skip all lines containingNaN, only in pairs for each calculation of the correlation coefficient in two columns. This option can return a matrix that is not positively semi-definite.
Output arguments
# R — correlation coefficients
+
the matrix
Details
Correlation coefficients returned as a matrix.
-
For a single input matrix, the matrix
Rhas a size of[size(A,2) size(A,2)]based on the number of random variables (columns) represented by the matrixA. The diagonal elements are by convention equal to one, and the off-diagonal elements represent the correlation coefficients of pairs of variables. The coefficient values can vary from−1before1, where−1means a direct negative correlation,0— lack of correlation, and1— direct positive correlation. The matrixRsymmetrical. -
For two input matrices,
RIt represents a matrix2×2with units on the diagonal and correlation coefficients off the diagonal. -
If any random variable is a constant, its correlation with all other variables is undefined, and the corresponding row and column value is
NaN.
# P — p-values
+
the matrix
Details
P-values returned as a matrix. The matrix P symmetrical and has the same size as R. All the elements on the diagonal are units, and the elements outside the diagonal are p-values for each pair of variables. The P-values range from 0 before 1, where values close to 0, correspond to a significant correlation in R and a low probability of confirming the null hypothesis.
# RL is the lower bound for the correlation coefficient
+
the matrix
Details
The lower bound of the correlation coefficient, returned as a matrix. The matrix RL symmetrical and has the same size as R. All diagonal elements are units, and non—diagonal elements represent the lower boundary. 95% of the confidence interval for the corresponding coefficient in R. Argument RL will not be refunded if R contains complex values.
# RU — upper bound for the correlation coefficient
+
the matrix
Details
The upper bound of the correlation coefficient, returned as a matrix. The matrix RU symmetrical and has the same size as R. All diagonal elements are units, and non—diagonal elements represent the upper bound. 95% of the confidence interval for the corresponding coefficient in R. Argument RU will not be refunded if R contains complex values.
Examples
Random columns of the matrix
Details
Let’s calculate the correlation coefficients for a matrix with two normally distributed random columns and one column defined through the other. Because the third column of the matrix A is a multiple of the second, these two variables are directly correlated, hence the correlation coefficient in the elements (2,3) and (3,2) matrices R equal to 1.
import EngeeDSP.Functions: corrcoef, randn
x = randn(6,1)
y = randn(6,1)
A = [x y 2*y .+ 3]
R = corrcoef(A)[1]
3×3 Matrix{Float64}:
1.0 -0.322277 -0.322277
-0.322277 1.0 1.0
-0.322277 1.0 1.0
Two random variables
Details
Let’s calculate the matrix of correlation coefficients between two normally distributed random vectors, each of which contains 10 measurements.
import EngeeDSP.Functions: corrcoef, randn
A = randn(10,1)
B = randn(10,1)
R = corrcoef(A,B)[1]
2×2 Matrix{Float64}:
1.0 0.193892
0.193892 1.0
Matrices of p-values, upper and lower bounds of the confidence interval
Details
Let’s calculate the correlation coefficients and the p-values of a normally distributed random matrix with the added fourth column equal to the sum of the values of the other three columns. Since the last column of the matrix is A It is a linear combination of the others, and there is a correlation between the fourth variable and each of the other three variables. Hence, the fourth row and the fourth column of the matrix P They contain very small p-values, which indicates the presence of significant correlations.
import EngeeDSP.Functions: corrcoef, randn
A = randn(50,3)
A = hcat(A, sum(A, dims=2))
R,P,RU,RL=corrcoef(A)
Output the matrix R.
print("R:")
R
R:
4×4 Matrix{Float64}:
1.0 -0.0191831 -0.0562013 0.512678
-0.0191831 1.0 -0.252374 0.497031
-0.0562013 -0.252374 1.0 0.511839
0.512678 0.497031 0.511839 1.0
Output the matrix P p-values.
print("P:")
P
P:
4×4 Matrix{Float64}:
1.0 0.894805 0.698269 0.000140925
0.894805 1.0 0.0770341 0.000240879
0.698269 0.0770341 1.0 0.000145133
0.000140925 0.000240879 0.000145133 1.0
Let’s output the matrices RL and RU the lower and upper limits of the coefficients.
print("RL:")
RL
RL:
4×4 Matrix{Float64}:
1.0 -0.295951 -0.329396 0.273336
-0.295951 1.0 -0.495887 0.253795
-0.329396 -0.495887 1.0 0.272283
0.273336 0.253795 0.272283 1.0
print("RU:")
RU
RU:
4×4 Matrix{Float64}:
1.0 0.260556 0.225677 0.692241
0.260556 1.0 0.0279365 0.681144
0.225677 0.0279365 1.0 0.691648
0.692241 0.681144 0.691648 1.0
NaN values
Details
Let’s create a normally distributed matrix containing the values NaN, and calculate the matrix of correlation coefficients by excluding all rows containing NaN.
import EngeeDSP.Functions: corrcoef, randn
A = randn(5, 3)
A[1, 3] = NaN
A[3, 2] = NaN
A
5×3 Matrix{Float64}:
0.194551 1.40891 NaN
0.279785 -0.534099 -0.792337
0.0512203 NaN -0.952975
-0.774466 -0.176248 0.353905
0.786782 -0.24375 1.59703
R = corrcoef(A,"Rows","complete")[1]
3×3 Matrix{Float64}:
1.0 -0.369186 0.340384
-0.369186 1.0 0.748195
0.340384 0.748195 1.0
Using the option "all" to include all values NaN in the calculation.
R = corrcoef(A,"Rows","all")[1]
3×3 Matrix{Float64}:
1.0 NaN NaN
NaN NaN NaN
NaN NaN NaN
Using the option "pairwise" for pairwise calculation of the correlation coefficient for each column. If one of the columns contains the value NaN this line will be skipped.
R = corrcoef(A,"Rows","pairwise")[1]
3×3 Matrix{Float64}:
1.0 0.00819072 0.300542
0.00819072 1.0 0.748195
0.300542 0.748195 1.0
Additional Info
Correlation coefficient
Details
The correlation coefficient of two random variables is a measure of their linear relationship. If each variable has for scalar measurements, the Pearson correlation coefficient is defined as
where
-
and — average and standard deviation ;
-
and — average and standard deviation .
Alternatively, the correlation coefficient can be determined using covariance. and :
A matrix of correlation coefficients of two random variables is a matrix of correlation coefficients for each pairwise combination of variables.:
Because and They always correlate directly with themselves, the elements are diagonally equal. 1 That is ,