The Maximum Diversity Sampler
  • Home
  • App
  • Densify Tutorial

Welcome to the Maximum Diversity Sampler

What is the Maximum Diversity Sampler

The Maximum Diversity Sampler maximizes diversity in a sample of populations. This online tool generates maximum diversity samples on the basis of linguistic and social databases, based on the sampling procedure described in Stoll et al. 2024.

How to use

In order to generate a maximum diversity sample, first select a database, possibly selecting certain domains and/or certain features within it. Since the Maximum Diversity Sampler works on the basis of pairwise distances between languages/populations across features, it presupposes that the relevant features do not many NAs (empty cells) that would jeopardize comparability of distances. To solve this problem, users first need to “densify” the selected database, i.e. reduce NAs by striking a balance between number of datapoints, density of coding (absence of NAs), and taxonomic diversity Graff et al. 2024. After completion of this procedure (which can take several minutes), the densified tables are display with summary statistics, and they are available for download. Users are advised to try different weights and inspect the resulting tables. Each table remains available until the window is closed.

The second step below is the actual sampling. First pick one of the densified tables and specify how many samples you want. Optionally, you can seed the procedure with an existing sample and ask for maximally diverse additions to it. The sampling procedure is described in Stoll et al. 2024.

Databases

Linguistic databases

  • The GBI Database is sourced from Grambank, which allows direct browsing.

  • The TLI Database is sourced from WALS, PHOIBLE, AUTOTYP, andLexibank, browsable here.

Both GBI and TLI are carefully curated to remove logical and strong statistical dependencies between features (Graff et al. 2025).

Human Ecology Grid Data

The Human Ecology Grid Data are available on GitHub.

How to cite

Cite the densifier:

Graff et al., (2025). densify: An R package to reduce empty cells in data frames of typological linguistic data. Journal of Open Source Software, 9(101), 7024, https://doi.org/10.21105/joss.07024

License

…

Densification weights