CODING VARIABLES
Published in 2005. In the
Handook of Social Measurement, ed. Kimberly Kempf-Leonard.
Academic Press.
Lee Epstein
Andrew D. Martin
The introduction is below. Click
here for the chapter (.pdf).
Introduction
Social scientists engaged in empirical research—that is,
research seeking to make claims or inferences based on
observations of the real world—undertake an enormous range
of activities. Some investigators collect information from
primary sources; others rely primarily on secondary archival
data. Many do little more than categorize the information they
collect; but many more deploy complex
technologies to analyze their data.
Seen in this way, it might appear that, beyond following some
basic rules of inference and guidelines for the conduct of their
research, scholars producing empirical work have little in
common. Their data come from a multitude
of sources; their tools for making use of the data are equally
varied. But there exists at least one task in empirical
scholarship that is universal, that virtually all scholars and
their students perform every time they undertake a new project:
coding variables, or the process of translating properties or
attributes of the world (i.e., variables) into a form that
researchers can systematically analyze after they have chosen the
appropriate measures to tap the underlying variable of interest.
Regardless of whether the data are qualitative or quantitative,
regardless of the form the analyses take, virtually all
researchers seeking to make claims or inferences based on
observations of the real world engage in the process of coding
data. That is, after measurement has taken place, they (1)
develop a precise schema to account for the values on which each
variable of interest can take and then (2) methodically
and physically assign each unit under study a value for every
given variable.
And yet, despite the universality of the task (not to mention
the fundamental role it plays in research), it typically receives
only the briefest mention in most volumes on designing research
or analyzing data. Why this is the case is a question on which we
can only speculate, but an obvious response centers on the
seemingly idiosyncratic nature of the undertaking. For some
projects, researchers may be best off coding inductively, that
is, collecting their data, drawing a representative sample,
examining the data in the sample, and then developing their
coding scheme; for others, investigators proceed in a deductive
manner, that is, they develop their schemes
first and then collect/code their data; and for still a third
set, a combination of inductive and deductive coding may be most
appropriate. (Some writers associate inductive coding with
research that primarily relies on qualitative [nonnumerical]
data/research and deductive coding with quantitative [numerical]
research. Given the [typically] dynamic nature of the processes
of collecting data and coding, however, these associations do not
always or perhaps even usually hold. Indeed, it is probably the
case that most researchers, regardless of whether their data
are
qualitative or quantitative, invoke some combination of deductive
and inductive coding.) The relative case (or difficulty) of the
coding task also can vary, depending
on the types of data with which the researcher is working, the
level of detail for which the coding scheme calls, and the amount
of pretesting the analyst has conducted, to name just
three.
Nonetheless, we believe it is possible to develop some
generalizations about the process of coding variables, as well as
guidelines for so doing. This much we attempt to accomplish here.
Our discussion is divided into two sections, corresponding to the
two key phases of the coding process: (1) developing a precise
schema to account for the values of the variables and (2)
methodically assigning each unit under study a value for every
given variable. Readers should be aware, however, that although
we made as much use as we could of existing literatures,
discussions of coding variables are sufficiently few and far
between (and where they do exist, rather scanty) that many of the
generalizations we make and the guidelines
we offer come largely from our own experience. Accordingly, sins
of commission and omission probably loom large in our discussion
(with the latter particularly likely in light of space
limitations).