Modules

Modules are the scripts, codebooks, and infrastructure needed to harmonize a group of related survey items. The core of each module is a set of scripts that format the related survey items. There is one script for every survey source. These scripts include sections for every wave, round, or year within each source. Each section creates two identification variables, one identifying respondents and one identifying the survey round. This enables matching individual respondents across all sources to accurately merge data from multiple modules together. The current modules are listed below. Titles under each module indicate separate series within the module. We also maintain a blank template module so researchers can easily add new variables.

Age

How do modules work?

Variables are formatted within the round or wave sections within each source scrip. Variables are renamed for consistency, valid values are re-ordered but not transformed (so all data is conserved), and missing or invalid values are re-coded to match a common framework shared by all modules. After the formatting process are rounds and sources are merged together for each module.

Each module includes a labeling script and a harmonization script. The labeling script applies variable and value labels to every formatted variable within the module. This happens after data from all sources is merged together to avoid repeating the labeling process within each source. Doing it at this point is more efficient because the same formatted variables often appear across sources where the question and answers are the same. The harmonization script creates common variables that combine data from many sources. Usually this is achieved by reducing the diversity of answers to a least common denominator shared by all or most variables. However, since we maintain the original values of all formatted variables it remains possible to easily create additional harmonized variables.

Finally, each module includes a codebook. We generally maintain one main master codebook that includes all variables from all modules. However, when a module is under active development we parse out the relevant section to create a codebook for just that module. Then when development work is complete we recombine the module codebook into the main codebook. This only happens after undertaking quality control checks on all scripts to fix errors and ensure the data matches the module codebook.

Why use modules?

Modules enable multiple individuals or teams to develop different sets of variables simultaneously. This facilitates rapid development and is fundamental to the collaborative approach adopted by HUMAN Surveys. People can share their work with others while also benefiting from the work done by others. The most common approach taken by researchers is to first update or create the module containing their dependent variable. Then researches can focus on updating or adding needed dependent variables only where they have their dependent variables. The scripting framework expands as more people contribute to the modules and nobody needs to repeat the work done by others.

Modules also provide a more manageable framework than having all variables formatted within the same scripts. This was how things started, but scripts became long and cumbersome. It quickly became necessary to separate out different sets of variables. The common tasks of loading, identifying, saving, appending, and merging datasets were also parsed out to a common set of backend system scripts. These are not modified by people developing modules. The modular approach embedded with a common system reduces human error risks because damage can only impact one module and common tasks are not edited.

In addition to all the modules below, we maintain a blank template module that enables researchers to easily add new variables to the framework.