Discuss divergence in normality with the help of suitable diagram and describe the factors causing divergence in the normal distribution. Discuss how divergence in normality is measured

Divergence in Normality

Get the full solved assignment PDF of MPC-006 of 2024-25 session now by clicking on above button.

In statistical analysis, normality refers to the assumption that data follow a normal distribution, often represented as a bell-shaped curve. However, in real-world datasets, the assumption of normality may not always hold. Divergence in normality occurs when the data do not conform to the ideal normal distribution. This can manifest in several ways, such as skewness, kurtosis, or the presence of outliers.

Characteristics of a Normal Distribution

A normal distribution is characterized by:

  • Symmetry: The left and right sides of the distribution are mirror images.
  • Bell Shape: The peak occurs at the mean, and the tails approach, but never touch, the horizontal axis.
  • Mean = Median = Mode: All three measures of central tendency coincide at the center.
  • 68-95-99.7 Rule: In a normal distribution, 68% of the data lie within one standard deviation, 95% within two standard deviations, and 99.7% within three standard deviations of the mean.

Types of Divergence in Normality

When a dataset deviates from normality, it can show one or more of the following characteristics:

  1. Skewness
    • Definition: Skewness refers to the asymmetry of the distribution. A distribution is said to be skewed if it is not symmetrical, with the tail extending more toward one side.
      • Positive Skew (Right Skew): The right tail is longer or fatter than the left, causing the mean to be greater than the median.
      • Negative Skew (Left Skew): The left tail is longer or fatter than the right, causing the mean to be less than the median.
    • Diagram:
      • Right Skew (Positive): | * | * * | * * | * * ------------------
      • Left Skew (Negative): *
    • *
    • *
    ```
  2. Kurtosis
    • Definition: Kurtosis refers to the “tailedness” or the peak of the distribution. It describes how much of the data is in the tails of the distribution compared to a normal distribution.
      • Leptokurtic: High, sharp peak and heavy tails (more extreme values), with kurtosis > 3.
      • Platykurtic: A lower, wider peak with lighter tails (fewer extreme values), with kurtosis < 3.
      • Mesokurtic: A normal distribution with a kurtosis of 3.
    • Diagram:
      • Leptokurtic (high peak, heavy tails): * * *
      • *
    • *

   ```

 - **Platykurtic (low peak, light tails)**:
   ```
 *    *    * 
*        *       *

   ```

3. Outliers

  • Definition: Outliers are extreme values that fall far away from the mean. Outliers can distort the normality of a distribution by increasing the skewness or kurtosis.
  • Outliers may be caused by errors in data collection or may represent rare events.

Factors Causing Divergence in Normality

Several factors can cause data to diverge from a normal distribution:

  1. Sample Size: Small sample sizes are more likely to show deviations from normality. As sample size increases, the distribution tends to become closer to normal due to the Central Limit Theorem.
  2. Skewed Data: If the underlying population is inherently skewed (e.g., income distribution, age at marriage), the data will exhibit skewness.
  3. Presence of Outliers: Outliers, which are extreme values that differ greatly from the rest of the data, can distort the shape of the distribution, causing it to deviate from normality.
  4. Measurement Errors: Errors in data collection or reporting can lead to a skewed or non-normal distribution. For example, rounding errors or incorrect data entry can introduce biases.
  5. Non-random Sampling: If the data are collected in a non-random manner, such as from a biased or specific group, the data may not follow a normal distribution.
  6. Truncated Data: When data are cut off or censored at certain limits (e.g., only measuring incomes below a certain threshold), this can lead to a non-normal distribution.
  7. Natural Distributions: Many natural phenomena (such as biological processes) do not follow normal distributions, especially when the processes are bounded (e.g., test scores or income distributions).

Measuring Divergence from Normality

There are several statistical methods to quantify how much a dataset diverges from normality:

  1. Skewness:
    • Skewness measures the asymmetry of the distribution. A skewness value of 0 indicates perfect symmetry, positive values indicate right skewness, and negative values indicate left skewness.
    • Formula: Skewness=n(n−1)(n−2)∑(xi−xˉs)3\text{Skewness} = \frac{n}{(n-1)(n-2)} \sum \left( \frac{x_i – \bar{x}}{s} \right)^3 where nn is the sample size, xix_i are the data points, xˉ\bar{x} is the sample mean, and ss is the standard deviation.
  2. Kurtosis:
    • Kurtosis measures the “tailedness” of the distribution. A kurtosis value of 3 indicates a normal distribution. Values greater than 3 indicate heavy tails (leptokurtic), and values less than 3 indicate light tails (platykurtic).
    • Formula: Kurtosis=n(n+1)(n−1)(n−2)(n−3)∑(xi−xˉs)4−3⋅(n−1)2(n−2)(n−3)\text{Kurtosis} = \frac{n(n+1)}{(n-1)(n-2)(n-3)} \sum \left( \frac{x_i – \bar{x}}{s} \right)^4 – 3 \cdot \frac{(n-1)^2}{(n-2)(n-3)} where nn is the sample size, xix_i are the data points, xˉ\bar{x} is the sample mean, and ss is the standard deviation.
  3. Normality Tests:
    • Shapiro-Wilk Test: A widely used statistical test to check for normality. A p-value less than a threshold (typically 0.05) indicates a significant departure from normality.
    • Kolmogorov-Smirnov Test: Compares the empirical distribution of data with a specified distribution (in this case, normal distribution).
    • Anderson-Darling Test: Another test for normality, focusing more on the tails of the distribution.
  4. Q-Q (Quantile-Quantile) Plots:
    • A Q-Q plot is a graphical tool used to compare the quantiles of the dataset with the quantiles of a normal distribution. If the points lie on a straight line, the data follows a normal distribution. Deviations from this line indicate a departure from normality.

Conclusion

Divergence in normality refers to the deviations from the ideal normal distribution, often caused by factors like skewness, kurtosis, and outliers. These factors can distort the shape of the distribution, and various statistical methods—such as skewness, kurtosis, normality tests, and Q-Q plots—can be used to measure and assess the degree of divergence. Recognizing and addressing such divergences is important, as many statistical tests assume normality in the data.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top