Документация Engee

Nonparametric tests

Страница в процессе перевода.

Anderson-Darling test

Available are both one-sample and -sample tests.

OneSampleADTest(x::AbstractVector{<:Real}, d::UnivariateDistribution)

Perform a one-sample Anderson—​Darling test of the null hypothesis that the data in vector x come from the distribution d against the alternative hypothesis that the sample is not drawn from d.

Implements: pvalue

KSampleADTest(xs::AbstractVector{<:Real}...; modified = true, nsim = 0)

Perform a -sample Anderson—​Darling test of the null hypothesis that the data in the vectors xs come from the same distribution against the alternative hypothesis that the samples come from different distributions.

modified parameter enables a modified test calculation for samples whose observations do not all coincide.

If nsim is equal to 0 (the default) the asymptotic calculation of p-value is used. If it is greater than 0, an estimation of p-values is used by generating nsim random splits of the pooled data on samples, evaluating the AD statistics for each split, and computing the proportion of simulated values which are greater or equal to observed. This proportion is reported as p-value estimate.

Implements: pvalue

References

  • F. W. Scholz and M. A. Stephens, K-Sample Anderson-Darling Tests, Journal of the American Statistical Association, Vol. 82, No. 399. (Sep., 1987), pp. 918-924.

Binomial test

BinomialTest(x::Integer, n::Integer, p::Real = 0.5)
BinomialTest(x::AbstractVector{Bool}, p::Real = 0.5)

Perform a binomial test of the null hypothesis that the distribution from which x successes were encountered in n draws (or alternatively from which the vector x was drawn) has success probability p against the alternative hypothesis that the success probability is not equal to p.

Computed confidence intervals by default are Clopper-Pearson intervals. See the confint(::BinomialTest) documentation for a list of supported methods to compute confidence intervals.

confint(test::BinomialTest; level = 0.95, tail = :both, method = :clopper_pearson)

Compute a confidence interval with coverage level for a binomial proportion using one of the following methods. Possible values for method are:

  • :clopper_pearson (default): Clopper-Pearson interval is based on the binomial distribution. The empirical coverage is never less than the nominal coverage of level; it is usually too conservative.

  • :wald: Wald (or normal approximation) interval relies on the standard approximation of the actual binomial distribution by a normal distribution. Coverage can be erratically poor for success probabilities close to zero or one.

  • :waldcc: Wald interval with a continuity correction that extends the interval by 1/2n on both ends.

  • :wilson: Wilson score interval relies on a normal approximation. In contrast to :wald, the standard deviation is not approximated by an empirical estimate, resulting in good empirical coverages even for small numbers of draws and extreme success probabilities.

  • :jeffrey: Jeffreys interval is a Bayesian credible interval obtained by using a non-informative Jeffreys prior. The interval is very similar to the Wilson interval.

  • :agresti_coull: Agresti-Coull interval is a simplified version of the Wilson interval; both are centered around the same value. The Agresti Coull interval has higher or equal coverage.

  • :arcsine: Confidence interval computed using the arcsine transformation to make independent of the probability .

References

  • Brown, L.D., Cai, T.T., and DasGupta, A. Interval estimation for a binomial proportion. Statistical Science, 16(2):101—​117, 2001.

  • Pires, Ana & Amado, Conceição. (2008). Interval Estimators for a Binomial Proportion: Comparison of Twenty Methods. REVSTAT. 6. 10.57805/revstat.v6i2.63.

External links

Fisher exact test

FisherExactTest(a::Integer, b::Integer, c::Integer, d::Integer)

Perform Fisher’s exact test of the null hypothesis that the success probabilities and are equal, that is the odds ratio is one, against the alternative hypothesis that they are not equal.

See pvalue(::FisherExactTest) and confint(::FisherExactTest) for details about the computation of the default p-value and confidence interval, respectively.

The contingency table is structured as:

- X1 X2

Y1

a

b

Y2

c

d

The show function output contains the conditional maximum likelihood estimate of the odds ratio rather than the sample odds ratio; it maximizes the likelihood given by Fisher’s non-central hypergeometric distribution.

References

  • Fay, M.P., Supplementary material to "Confidence intervals that match Fisher’s exact or Blaker’s exact tests". Biostatistics, Volume 11, Issue 2, 1 April 2010, Pages 373—​374, link

confint(x::FisherExactTest; level::Float64=0.95, tail=:both, method=:central)

Compute a confidence interval with coverage level. One-sided intervals are based on Fisher’s non-central hypergeometric distribution. For tail = :both, the only method implemented yet is the central interval (:central).

Since the p-value is not necessarily unimodal, the corresponding confidence region might not be an interval.

References

  • Gibbons, J.D, Pratt, J.W. P-values: Interpretation and Methodology, American Statistican, 29(1):20-25, 1975.

  • Fay, M.P., Supplementary material to "Confidence intervals that match Fisher’s exact or Blaker’s exact tests". Biostatistics, Volume 11, Issue 2, 1 April 2010, Pages 373—​374, link

pvalue(x::FisherExactTest; tail = :both, method = :central)

Compute the p-value for a given Fisher exact test.

The one-sided p-values are based on Fisher’s non-central hypergeometric distribution with odds ratio :

For tail = :both, possible values for method are:

  • :central (default): Central interval, i.e. the p-value is two times the minimum of the one-sided p-values.

  • :minlike: Minimum likelihood interval, i.e. the p-value is computed by summing all tables with the same marginals that are equally or less probable:

References

  • Gibbons, J.D., Pratt, J.W., P-values: Interpretation and Methodology, American Statistican, 29(1):20-25, 1975.

  • Fay, M.P., Supplementary material to "Confidence intervals that match Fisher’s exact or Blaker’s exact tests". Biostatistics, Volume 11, Issue 2, 1 April 2010, Pages 373—​374, link

Kolmogorov-Smirnov test

Available are an exact one-sample test and approximate (i.e. asymptotic) one- and two-sample tests.

ExactOneSampleKSTest(x::AbstractVector{<:Real}, d::UnivariateDistribution)

Perform a one-sample exact Kolmogorov—​Smirnov test of the null hypothesis that the data in vector x comes from the distribution d against the alternative hypothesis that the sample is not drawn from d.

Implements: pvalue

ApproximateOneSampleKSTest(x::AbstractVector{<:Real}, d::UnivariateDistribution)

Perform an asymptotic one-sample Kolmogorov—​Smirnov test of the null hypothesis that the data in vector x comes from the distribution d against the alternative hypothesis that the sample is not drawn from d.

Implements: pvalue

ApproximateTwoSampleKSTest(x::AbstractVector{<:Real}, y::AbstractVector{<:Real})

Perform an asymptotic two-sample Kolmogorov—​Smirnov-test of the null hypothesis that x and y are drawn from the same distribution against the alternative hypothesis that they come from different distributions.

Implements: pvalue

External links

Kruskal-Wallis rank sum test

KruskalWallisTest(groups::AbstractVector{<:Real}...)

Perform Kruskal-Wallis rank sum test of the null hypothesis that the groups come from the same distribution against the alternative hypothesis that that at least one group stochastically dominates one other group.

The Kruskal-Wallis test is an extension of the Mann-Whitney U test to more than two groups.

The p-value is computed using a approximation to the distribution of the test statistic :

where is the set of the counts of tied values at each tied position, is the total number of observations across all groups, and and are the number of observations and the rank sum in group , respectively. See references for further details.

Implements: pvalue

References

  • Meyer, J.P, Seaman, M.A., Expanded tables of critical values for the Kruskal-Wallis H statistic. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, April 2006.

External links

Mann-Whitney U test

MannWhitneyUTest(x::AbstractVector{<:Real}, y::AbstractVector{<:Real})

Perform a Mann-Whitney U test of the null hypothesis that the probability that an observation drawn from the same population as x is greater than an observation drawn from the same population as y is equal to the probability that an observation drawn from the same population as y is greater than an observation drawn from the same population as x against the alternative hypothesis that these probabilities are not equal.

The Mann-Whitney U test is sometimes known as the Wilcoxon rank-sum test.

When there are no tied ranks and ≤50 samples, or tied ranks and ≤10 samples, MannWhitneyUTest performs an exact Mann-Whitney U test. In all other cases, MannWhitneyUTest performs an approximate Mann-Whitney U test. Behavior may be further controlled by using ExactMannWhitneyUTest or ApproximateMannWhitneyUTest directly.

Implements: pvalue

ExactMannWhitneyUTest(x::AbstractVector{<:Real}, y::AbstractVector{<:Real})

Perform an exact Mann-Whitney U test of the null hypothesis that the probability that an observation drawn from the same population as x is greater than an observation drawn from the same population as y is equal to the probability that an observation drawn from the same population as y is greater than an observation drawn from the same population as x against the alternative hypothesis that these probabilities are not equal.

When there are no tied ranks, the exact p-value is computed using the pwilcox function from the Rmath package. In the presence of tied ranks, a p-value is computed by exhaustive enumeration of permutations, which can be very slow for even moderately sized data sets.

Implements: pvalue

ApproximateMannWhitneyUTest(x::AbstractVector{<:Real}, y::AbstractVector{<:Real})

Perform an approximate Mann-Whitney U test of the null hypothesis that the probability that an observation drawn from the same population as x is greater than an observation drawn from the same population as y is equal to the probability that an observation drawn from the same population as y is greater than an observation drawn from the same population as x against the alternative hypothesis that these probabilities are not equal.

The p-value is computed using a normal approximation to the distribution of the Mann-Whitney U statistic:

where is the set of the counts of tied values at each tied position.

Implements: pvalue

Sign test

SignTest(x::AbstractVector{T<:Real}, median::Real = 0)
SignTest(x::AbstractVector{T<:Real}, y::AbstractVector{T<:Real}, median::Real = 0)

Perform a sign test of the null hypothesis that the distribution from which x (or x - y if y is provided) was drawn has median median against the alternative hypothesis that the median is not equal to median.

Implements: pvalue, confint

Wald-Wolfowitz independence test

WaldWolfowitzTest(x::AbstractVector{Bool})
WaldWolfowitzTest(x::AbstractVector{<:Real})

Perform the Wald-Wolfowitz (or Runs) test of the null hypothesis that the given data is random, or independently sampled. The data can come as many-valued or two-valued (Boolean). If many-valued, the sample is transformed by labelling each element as above or below the median.

Implements: pvalue

Wilcoxon signed rank test

SignedRankTest(x::AbstractVector{<:Real})
SignedRankTest(x::AbstractVector{<:Real}, y::AbstractVector{<:Real})

Perform a Wilcoxon signed rank test of the null hypothesis that the distribution of x (or the difference x - y if y is provided) has zero median against the alternative hypothesis that the median is non-zero.

When there are no tied ranks and ≤50 samples, or tied ranks and ≤15 samples, SignedRankTest performs an exact signed rank test. In all other cases, SignedRankTest performs an approximate signed rank test. Behavior may be further controlled by using ExactSignedRankTest or ApproximateSignedRankTest directly.

Implements: pvalue, confint

ExactSignedRankTest(x::AbstractVector{<:Real}[, y::AbstractVector{<:Real}])

Perform a Wilcoxon exact signed rank U test of the null hypothesis that the distribution of x (or the difference x - y if y is provided) has zero median against the alternative hypothesis that the median is non-zero.

When there are no tied ranks, the exact p-value is computed using the psignrank function from the Rmath package. In the presence of tied ranks, a p-value is computed by exhaustive enumeration of permutations, which can be very slow for even moderately sized data sets.

Implements: pvalue, confint

ApproximateSignedRankTest(x::AbstractVector{<:Real}[, y::AbstractVector{<:Real}])

Perform a Wilcoxon approximate signed rank U test of the null hypothesis that the distribution of x (or the difference x - y if y is provided) has zero median against the alternative hypothesis that the median is non-zero.

The p-value is computed using a normal approximation to the distribution of the signed rank statistic:

where is the set of the counts of tied values at each tied position.

Implements: pvalue, confint

Permutation test

ExactPermutationTest(x::Vector, y::Vector, f::Function)

Perform a permutation test (a.k.a. randomization test) of the null hypothesis that f(x) is equal to f(y). All possible permutations are sampled.

ApproximatePermutationTest([rng::AbstractRNG,] x::Vector, y::Vector, f::Function, n::Int)

Perform a permutation test (a.k.a. randomization test) of the null hypothesis that f(x) is equal to f(y). n of the factorial(length(x)+length(y)) permutations are sampled at random. A random number generator can optionally be passed as the first argument. The default generator is Random.default_rng().

Fligner-Killeen test

FlignerKilleenTest(groups::AbstractVector{<:Real}...)

Perform Fligner-Killeen median test of the null hypothesis that the groups have equal variances, a test for homogeneity of variances.

This test is most robust against departures from normality, see references. It is a -sample simple linear rank method that uses the ranks of the absolute values of the centered samples and weights

The version implemented here uses median centering in each of the samples.

Implements: pvalue

References

  • Conover, W. J., Johnson, M. E., Johnson, M. M., A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics, 23, 351—​361, 1980

External links