Kendall's Rank Correlation

From NesselroadeSTATSwiki
Jump to: navigation, search

Description

Kendall’s rank correlation, denoted as 𝜏 (tau), is a nonparametric statistical measure of the strength and direction of the association between the ranks of two ordinal variables (Kendall, 1938). The correlation coefficient is based on a monotonic association rather than the linear relationship between the two variables. A relationship is monotonic when the function is either always increasing or always decreasing. There may be a point where the function stays constant, but a monotonic function does not change direction.

Kendall’s tau can only accurately calculate correlation for a monotonic relationship; since tau is sensitive to changes in a function’s direction, using this coefficient on a non-monotonic relationship may produce unreliable results (Lieberson, 1964).

The graphs below show examples of both monotonic and non-monotonic relationships between two variables.

KendallML.pngKendallMNL.png

KendallNM.png

Common Alternative Names

  • Kendall’s rank correlation coefficient
  • Kendall's tau

Reasons to use Kendall's Tau

  • The data consists of two ordinal variables or two continuous variables that can be transformed into ordinal variables.
  • Kendall’s tau should be used when the assumptions for Pearson’s product-moment correlation (r) are violated and/or when Spearman’s rho is not an accurate measurement due to small sample size or many tied ranks.

Correlation Analysis

A scatterplot can be used to visualize the data in order to see if a monotonic relationship is present. The correlation coefficient 𝜏 will not be affected by which variable is chosen to be on the x-axis or y-axis. Generally, a larger sample size is better when calculating correlation. Otherwise, a small sample size (n < 20) could lead to unreliable results (Arndt et al., 1999). The value of 𝜏 is always between -1 and 1. A positive value indicates a positive monotonic relationship between the two variables, while a negative value indicates a negative monotonic relationship between the variables. If |𝜏| is close to 1, there is a strong association between the variables. If 𝜏 is close to 0, there is little evidence of a monotonic relationship between the variables, although there may be some other linear or nonlinear relationship present. A perfect correlation is indicated when |𝜏| has a magnitude of 1.

When ranking data, the correlation is unaffected by whether the lowest or highest value is ranked 1, as long as all of the variables are ranked in the same manner. Each variable must be ranked separately. When two values within a variable are the same, they should be assigned the average of the rank they would otherwise occupy. For example:

Sample Data
Variable X Variable Y Rank (X) Rank (Y)
84 90 2.5 1
79 85 4 2
62 76 5 3
84 64 2.5 4
96 59 1 5

The values in this dataset are ranked from highest (1) to lowest (5). Since the first and fourth values in Math Score would occupy ranks 2 and 3, the average rank (2.5) is assigned to both values. This is called a “tie” or “tied ranks”.

In this case, an observation is defined as a pair consisting of a rank for Math Score (X) and a rank for English Score (Y). Each row in the table above contains one observation. To calculate the number of concordant and discordant pairs, observations are paired together and compared as (X1, Y1) and (X2, Y2). Each observation (i.e. Student), must be compared to all of the other observations once, but no more than once (i.e. if you begin by comparing Student A with Student B, you would not then also compare Student B to Student A because the pair has already been made once before). Concordancy is evaluated for all possible pair combinations. If a pair is concordant then both variables in one pair will either be greater than, less than, or equal to the corresponding variables in the other pair (e.g. X1 > X2 and Y1 > Y2). If the direction of change is not equal for both variables (e.g. X1 < X2 and Y1 = Y2), then the pair is discordant. The number of concordant (denoted as nconcordant or nc) and discordant pairs (denoted as ndiscordant or nd) are used to calculate Kendall’s tau-c. Using the data from above, nc and nd are calculated as follows:

Students being compared X 1 Y 1 X 2 Y 2 Concordant or Discordant?
A & B 2.5 1 4 2 Concordant
A & C 2.5 1 5 3 Concordant
A & D 2.5 1 2.5 4 Discordant
A & E 2.5 1 1 5 Discordant
B & C 4 2 5 3 Concordant
B & D 4 2 2.5 4 Discordant
B & E 4 2 1 5 Discordant
C & D 5 3 2.5 4 Discordant
C & E 5 3 1 5 Discordant
D & E 2.5 4 1 5 Discordant
nconcordant = 3, ndiscordant = 7

Best Practices

Visualizing Data

When doing Kendall rank correlation analyses, it is important to visualize the data with scatterplots, as scatterplots may reveal linear or nonlinear associations not described by the correlation coefficient (e.g. a non-monotonic parabolic relationship).

Tau Types

Tau-a does not make adjustments for tied ranks, so it should not be used for data with ties. Tau-b and tau-c both make adjustments for tied ranks; however, tau-c should be used when the contingency table is rectangular (i.e., the number of rows and columns are not the same). When the contingency table is square, tau-b and tau-c are the same.

Disadvantage of Ranked Data

A disadvantage of using a rank-order correlation is that much of the information in the data is lost. Observations are ranked under the assumption that rank 1 is one rank higher than rank 2, etc. (Spearman, 1906). In the example above, we assume a 1-rank difference between all of the Variable Y observations even though the magnitude of the differences between the actual values varies.

Outliers

Although Spearman’s 𝜌 and Kendall’s 𝜏 are both sensitive to outliers, Kendall’s 𝜏 is the more robust estimation of the two (Croux & Dehon, 2010; Sinsomboonthong, 2016).

Common Misconceptions

Misonception: Kendall's tau assumes a linear relationship between the variables.

  • Kendall’s tau does not assume that the relationship between the variables is linear and does not require the data to be at the interval level, unlike Pearson’s r.

Misconception: Correlation implies causation.

  • While 𝜏 provides information about the association between two variables, it does not allow conclusions to be made about causation. Correlation does not imply causation.

Model Formulas

Variable notation:

  • n0 =
  • n1 =
  • n2 =
  • nc = number of concordant pairs
  • nd = number of discordant pairs
  • n = number of observations
  • r = number of rows
  • c = number of columns
  • m = min(r,c)
  • ti = number of tied values in the ith group of ties for the first quantity
  • uj = number of tied values in the jth group of ties for the first quantity

Tau-a

(not adjusted for tied ranks)

Tau-b

(adjusted for tied ranks)

Tau-c

(also referred to as Staurt-Kendall Tau-c, adjusted for tied ranks)

Example

R Code

# Reading in tutorial data
myData<-read.csv("myData.csv")
# Assigning variables
x <- myData[,1]
y <- myData[,2]
# Making scatterplot
plot(x,y)
# Calculating Kendall's tau
cor(x,y,method=c("kendall"))
# Correlation test (alternative method for calculating correlation)
# Null hypothesis: True tau is equal to 0
cor.test(x,y,method=c("kendall"))

Model Extensions and Alternative Approaches

Linear Relationship Between Variables

If a linear relationship between two variables is thought to exist, Pearson’s r may be used.

Spearman's Rho

Spearman’s rank-order correlation may also be used to calculate the correlation between two variables with a monotonic relationship.

References

Arndt, S., Turvey, C., & Andreasen, N. C. (1999). Correlating and predicting psychiatric symptom ratings: Spearmans r versus Kendalls tau correlation. Journal of Psychiatric Research, 33(2), 97–104. https://doi.org/https://doi.org/10.1016/S0022-3956(98)90046-2

Croux, C., & Dehon, C. (2010). Influence Functions of the Spearman and Kendall Correlation Measures. Statistical Methods and Applications, 19(4), 497–515. https://doi.org/10.1007/s10260-010-0142-z

Kendall, M. G. (1938). A New Measure of Rank Correlation. Biometrika, 30(1/2), 81–93. https://doi.org/10.2307/2332226

Lieberson, S. (1964). Limitations in the Application of Non-Parametric Coefficients of Correlation. American Sociological Review, 29(5), 744–746.

Sinsomboonthong, J. (2016). Robust Estimators for the Correlation Measure to Resist Outliers in Data. Journal of Mathematical and Fundamental Sciences, 48(3), 263–275. https://doi.org/10.5614/j.math.fund.sci.2016.48.3.7

Spearman, C. (1906). Footrule for measuring correlation. British Journal of Psychology, 2(1), 89. http://proxy01.its.virginia.edu/login?url=https://www.proquest.com/docview/1293617929?accountid=14678