This document was inspired by this post by Arthur Charpentier.
The repository with the code for creating this post is here. Just some plots so far. We may try something more elaborated or formal (e.g., explicit comparisons of Colombia with other countries) later on.
The data file is available here.
## Warning: 'memory.limit()' is Windows-specific
One possible measure of inequality is entropy, the classical notion developed by Shannon for Information Theory. This is a generalized version. I am using here the one with \(\alpha=1\).
In the plot country labels are blue for OECD countries and red for non-OECD latin american countries.
Although entropy is not necessarily a measure of variance, collections of test scores with higher entropy tend to have lower average than those with lower entropy:
By the way, why is the entropy of female scores (almost) consistently lower than that of males of the same country?
Standard deviation seems to be the most commonly used indicator of performance inequality in standarized test scores. The ranking changes drastically:
First, a violin plot of distributions of scores in math (differentiating by sex and ordered by entropy):
Empirical cummulative density function for each country in math:
And a kernel density estimate (also for math):
The violin plots show there is a substantial difference between OECD countries and L.A. countries. For instance, note how almost all OECD countries have their median approximately around 500 points while the proportion of Colombian students above that score is just a tail, in fact the median score of OECD countries in math corresponds to the 90.74 quantile of Colombia, that is, 50% of OECD students scored better than at least 90.74% of colombian students (!).
In his post, Charpentier compares France’s score distribution with other countries by plotting the difference of the quantiles at each level. Here I do the same for Colombia against 17 other countries with the math scores. Just as an illustration I also include Colombia versus Colombia (red is male and blue is female).
Let’s do the same with Singapore:
And with the U.S.:
Charpentier suggests another way of comparing these distributions: instead of calculating the quantile of the scores for each level, we could calculate the average of the scores above each quantile and, given two countries, take the difference of these values at each level.
Once again, let’s compare Colombia with a bunch of countries:
And with the U.S.:
Serious approaches to this problem: