This document was inspired by this post by Arthur Charpentier.

The repository with the code for creating this post is here. Just some plots so far. We may try something more elaborated or formal (e.g., explicit comparisons of Colombia with other countries) later on.

The data file is available here.

## Warning: 'memory.limit()' is Windows-specific

Entropy

One possible measure of inequality is entropy, the classical notion developed by Shannon for Information Theory. This is a generalized version. I am using here the one with \(\alpha=1\).

In the plot country labels are blue for OECD countries and red for non-OECD latin american countries.

plot of chunk unnamed-chunk-1

Although entropy is not necessarily a measure of variance, collections of test scores with higher entropy tend to have lower average than those with lower entropy:

plot of chunk unnamed-chunk-2

By the way, why is the entropy of female scores (almost) consistently lower than that of males of the same country?

Standard Deviation

Standard deviation seems to be the most commonly used indicator of performance inequality in standarized test scores. The ranking changes drastically:

plot of chunk unnamed-chunk-3

Distributions of scores (for selected countries)

First, a violin plot of distributions of scores in math (differentiating by sex and ordered by entropy):

plot of chunk unnamed-chunk-5

Empirical cummulative density function for each country in math:

plot of chunk unnamed-chunk-6

And a kernel density estimate (also for math):

plot of chunk unnamed-chunk-7

The violin plots show there is a substantial difference between OECD countries and L.A. countries. For instance, note how almost all OECD countries have their median approximately around 500 points while the proportion of Colombian students above that score is just a tail, in fact the median score of OECD countries in math corresponds to the 90.74 quantile of Colombia, that is, 50% of OECD students scored better than at least 90.74% of colombian students (!).

Quantiles

In his post, Charpentier compares France’s score distribution with other countries by plotting the difference of the quantiles at each level. Here I do the same for Colombia against 17 other countries with the math scores. Just as an illustration I also include Colombia versus Colombia (red is male and blue is female).

plot of chunk unnamed-chunk-8

Let’s do the same with Singapore:

plot of chunk unnamed-chunk-9

And with the U.S.:

plot of chunk unnamed-chunk-10

Quantile averages

Charpentier suggests another way of comparing these distributions: instead of calculating the quantile of the scores for each level, we could calculate the average of the scores above each quantile and, given two countries, take the difference of these values at each level.

Once again, let’s compare Colombia with a bunch of countries:

plot of chunk unnamed-chunk-11

And with the U.S.:

plot of chunk unnamed-chunk-12

Further reading

Serious approaches to this problem: