Point-Biserial Correlation
Get the full solved assignment PDF of MPC-006 of 2024-25 session now by clicking on above button.
The point-biserial correlation is a statistical measure used to assess the strength and direction of the association between one dichotomous (binary) variable and one continuous variable. It is a specific case of the Pearson correlation and is typically used when one of the variables has only two categories (e.g., yes/no, male/female, success/failure) and the other variable is continuous (e.g., height, weight, test scores).
Formula for Point-Biserial Correlation:
The formula for calculating the point-biserial correlation coefficient (rpbr_{pb}) is: rpb=M1−M2s⋅n1n2n(n−1)r_{pb} = \frac{M_1 – M_2}{s} \cdot \sqrt{\frac{n_1 n_2}{n(n-1)}}
Where:
- M1M_1 = mean of the continuous variable for group 1 (coded as 1).
- M2M_2 = mean of the continuous variable for group 2 (coded as 0).
- ss = standard deviation of the continuous variable.
- n1n_1 = number of observations in group 1.
- n2n_2 = number of observations in group 2.
- nn = total number of observations in both groups.
Interpretation:
- rpbr_{pb} ranges from -1 to +1:
- A positive correlation indicates that as the continuous variable increases, the likelihood of being in the “1” group (e.g., success) increases.
- A negative correlation indicates that as the continuous variable increases, the likelihood of being in the “0” group (e.g., failure) increases.
- A value close to 0 suggests no relationship between the continuous variable and the binary variable.
Example:
You might use point-biserial correlation to study whether there is a relationship between gender (coded as 0 = female, 1 = male) and test scores (a continuous variable). The point-biserial correlation would tell you whether test scores differ significantly between males and females.
Phi Coefficient
The phi coefficient is a measure of association for binary (dichotomous) variables. It is used to assess the strength of the relationship between two binary variables, each with two categories (e.g., success/failure, yes/no). The phi coefficient is a specific case of the Pearson correlation when both variables are binary.
Formula for Phi Coefficient:
The formula for the phi coefficient (ϕ\phi) is: ϕ=(AD−BC)(A+B)(C+D)(A+C)(B+D)\phi = \frac{(AD – BC)}{\sqrt{(A+B)(C+D)(A+C)(B+D)}}
Where:
- AA = number of cases where both variables are 1 (e.g., both variables are “yes”).
- BB = number of cases where the first variable is 1 and the second is 0 (e.g., first variable is “yes”, second is “no”).
- CC = number of cases where the first variable is 0 and the second is 1 (e.g., first variable is “no”, second is “yes”).
- DD = number of cases where both variables are 0 (e.g., both variables are “no”).
This formula is derived from the contingency table, which organizes the binary data.
Interpretation:
- ϕ\phi ranges from -1 to +1:
- A positive value indicates that as one binary variable is more likely to be 1, the other binary variable is also more likely to be 1 (a positive association).
- A negative value indicates that as one binary variable is more likely to be 1, the other is more likely to be 0 (a negative association).
- A value of 0 indicates no association between the two binary variables.
- The magnitude of the phi coefficient reflects the strength of the association:
- ϕ\phi near +1 or -1 indicates a strong association (either positive or negative).
- ϕ\phi near 0 indicates a weak or no association.
Example:
You might use the phi coefficient to examine the relationship between whether individuals smoke (yes/no) and whether they have lung disease (yes/no). The phi coefficient would tell you whether there is a significant association between smoking and lung disease.
Comparison: Point-Biserial Correlation vs. Phi Coefficient
Aspect | Point-Biserial Correlation | Phi Coefficient |
---|---|---|
Used for | One dichotomous and one continuous variable. | Two dichotomous (binary) variables. |
Type of Data | One continuous and one binary variable. | Two binary variables. |
Range | -1 to +1 | -1 to +1 |
Interpretation | Measures the strength and direction of the relationship between a continuous variable and a binary variable. | Measures the strength and direction of the association between two binary variables. |
Assumptions | Assumes the continuous variable is normally distributed within each group. | Assumes the two binary variables are independent. |
Conclusion
- Point-Biserial Correlation is used when there is a continuous variable and a binary variable, and it measures the relationship between the two.
- Phi Coefficient is used when both variables are binary, and it assesses the strength and direction of their relationship.
Both measures are useful tools for analyzing binary data, but they apply to different types of research questions depending on the nature of the variables involved.