Ecological Fallacy

Feb 9, 2021 | Thesis Development

A fallacy is an error of reasoning, usually based on erroneous assumptions. Researchers are very familiar with all the ways they can get it wrong, with the fallacies to which they are susceptible.

The ecological fallacy is the mistake of attributing the characteristics of a population to an individual. Statistical inference aims to generalize from a population sample to the entire population. The objective of statistics is to generalize from the particular to the whole and not from the whole to the particular.

As such, statistics cannot offer a solution to the ecological fallacy. The best way to obtain data on individuals or on subpopulations within a larger population is to make sure that the unit of analysis is the individual or subpopulation and not the older population. As Richards and colleagues point out, this problem can be avoided by designing survey instruments that choose individual characteristics and attitudes. Only from individualistic data can the researcher trace individual and subpopulation characteristics when necessary.

Ecological and Exceptional Fallacy

The ecological fallacy occurs when conclusions are drawn about individuals based solely on the analysis of group data. For example, suppose you measure the math scores of a particular class and find that they have the highest average score in the district.

Later (probably at the mall) you meet one of the kids in that class and think “he must be a math genius.” Ahá! Fallacy! Just because you come from the class with the highest average does not mean that you are automatically a person with high grades in mathematics. She could be the lowest-scoring student in math in a class that is otherwise made up of math geniuses.

The fallacy of the exception is something like the reverse of the ecological fallacy. It occurs when a group conclusion is reached on the basis of exceptional cases. It’s the kind of fallacious reasoning that’s at the basis of a lot of sexism and racism. The stereotype is that of the guy who sees a woman make a driving mistake and concludes that “women are lousy drivers.” Error! Fallacy! (Boyer, Francois, Doutre et al., 2006)

Importance of both fallacies

Both fallacies point to some of the pitfalls that exist in both research and everyday reasoning. They also point out how important it is for us to investigate. We need to empirically determine how individuals act (and not just rely on group averages). Similarly, we have to analyze whether there are correlations between certain behaviors and certain groups (one can consider the whole controversy surrounding the book The Bell Curve as an attempt to examine whether the supposed relationship between race and IQ is real or a fallacy.

Background to the term Ecological Fallacy

In 1950, Robinson coined the term ecological fallacy to refer to the error of interpreting environmental variations as variations between individuals. One tactic to solve Robinson’s ecological fallacy is to produce surveys in which the questions clearly indicate whether these are personal opinions of the subject or general assessments of an environment.

A comparable example where the individual is the unit of analysis is asking respondents to agree or disagree with the comment: “Sometimes I’m not prepared when I come to class.” The ecological question provides a generalized assessment of the environment without addressing the source of disorganization. In the ecological example, it is not clear to which unit of analysis the subjects respond: the environment, the teacher, the other students, themselves or all of them.

Findings by Richards and colleagues

Richards and his colleagues compared the use of individualistic and ecological units to analyze the classroom environment. They used the Classroom Environment Scales developed by Moos and Trickett, which consist of true-false questions about the classroom environment. Richards et al. (1991) in Henderson, Caplan and Daniel (2004). noted that the questions were “modeled and resemble the type of questions used in objective personality tests” (p. 425).

Consequently, measures of dispersion (such as standard deviation) were much higher among individuals in environments than between environments and measures of reliability (alpha) were also higher between environments than within them. Richards et al. they also suggested that assessments of environmental measures were mediated by personality differences between individuals and that this confused outcomes within any setting. Therefore, survey questions should be designed to distinguish and obtain assessments of the environment rather than serving as “covert measures of individual differences.”

Use of the terms Ecology and Environment

Richards and his colleagues use the terms ecology and environment interchangeably. However, it must be remembered that, strictly speaking, the environment is not the unit of analysis, but the group that inhabits it. The real classroom doesn’t fill out a questionnaire, the students do. The study by Richards et al. it is important because it unequivocally confirms that, by themselves and without theoretical justification, individuals as a unit of analysis are invalid and unreliable units for measuring the characteristics of the environment.

It should be noted that, when talking about “environment”, Richards et al. they refer to the small-scale groups that inhabit the environment and therefore the environment is a unit at the group level. If the objective of the study is to understand the characteristics and dynamics of the environments (in this case, the classroom). So the right sample for the study is the environments and not the individuals, and the researcher’s goal is to examine the variation between environments and not between individuals.

Gary King’s Findings

In 1997, Gary King proposed a statistical solution to the problem of ecological inference. Leo Goodman had previously proposed an ecological regression model to estimate individual differences from census data. King added to Goodman’s model the use of random coefficients to further minimize aggregation bias. Their solution has been partially successful in finding estimators of subpopulations within a larger population. However, although statistical sampling is a powerful tool, statistics is not good at making low-level inferences, that is, reducing the whole to its components, a kind of inverse statistic (Williams, 1994).

Examples of Ecological Fallacy

Statistically, a correlation tends to be greater when an association is evaluated at the group level than when it is evaluated at the individual level. However, details about individuals can be lost in aggregated data sets. There are several examples of ecological fallacy.

Example 1

In the first example, the researchers want to study the relationships between nativity (represented by the percentage of the population that has been born abroad) and literacy (represented by the percentage of the population that can read and write). Calculations are applied based on the populations of several U.S. states. In such research, correlations might be meaningless if foreign-born individuals tend to live in states where natives are more literate.

Example 2

In another example, in a study designed to examine relationships between diet, lifestyle, heart disease and stroke, researchers found that mean inbound blood pressures and stroke death rates were inversely correlated. This for certain cohorts (study groups) of men aged 45 to 59 years with a follow-up of 25 years.

The finding was contrary to expectations. Subsequent analyses conducted at the individual level showed that the association between blood pressure and stroke mortality was strongly positive in most study groups. The explanation for this paradox is that, within each cohort, individuals who had suffered a stroke. In addition they had died because of it tended to have high blood pressure. However, when the individual values of each cohort were averaged and used to calculate the correlation, cohorts with higher mean blood pressures may have turned out to have smaller mortality rates. This is simply because of the heterogeneity of correlations between cohorts.

Example 3

In a third example, the researchers found that breast cancer death rates increased significantly in countries where fat consumption was high. This compared to countries where fat consumption was low. It is an association of aggregated data in which the observation unit is the country. Therefore, in countries with more dietary fat and higher rates of breast cancer, women who consume fatty foods are not necessarily more likely to develop breast cancer. It cannot be assured that breast cancer cases had a high fat intake.

To determine whether the ecological hypotheses generated by group-level analyses are true for individuals, it is necessary to collect data at the individual level. For causal inference, individual data are necessary to account for population heterogeneity and confounding bias.

Example 4

In a study in which local and individual hospitalization rates were derived from community-level estimates of various socioeconomic status indicators (SES), Hofer noted that SES community profiles may not be representative of community individuals who actually go to the hospital. For example, it is known that the proportion of elderly people who have health coverage is much higher than that of young adults, and that some of these elderly patients will use the hospital many times.

To obtain accurate estimates of the subpopulations that use and do not use the hospital, it is necessary to obtain data on samples of individuals, not social aggregates. The best aggregate estimator of differences between subpopulations or individuals is to ensure that the individual characteristics to be analysed are representative of the aggregate or to use complete analytical models that target only the SES dataset relevant to a target population. In their study on hospitalization rates, Billings et al. considered it necessary to include interactions between age and income when assessing SES variables in small area studies.

Study Analysis

Although ecological units (groups) comprise individuals, their characteristics are not equivalent to those of individuals in the group; therefore, a different theory must be applied to studies that use collectivities as units of analysis than to studies that use the individual as a unit of analysis.

When collectivities are the units of analysis, the appropriate research object should be the general characteristics and emergent properties of populations. Group-level characteristics can be very different from those of individual group members. Ethnographic and psychological studies often sin the opposite of the ecological fallacy: the fallacy of transferring individual characteristics to a group. This problem was dubbed the “fallacy of individual differences” by Richards in 1990.

Solutions to the problem of ecological inference?

Although Robinson’s criticism caused a great commotion in the social science community and certainly influenced some researchers to avoid aggregated data. It also spawned a literature on “solutions” to the problem of ecological inference. Goodman approached the problem in 1953 and 1959 in terms of dichotomous variables. He pointed out that the dependent variable at the aggregate level is a proportion, which must be the weighted sum of the unobserved proportions of the two groups formed by the independent variable. This is nothing more than an accounting identity. In the case of the vote, we look at the overall proportion that votes for a certain party. We want to make inferences about the votes of specific individuals based on their racial group. The weighted average of the votes of the two groups should add up to the total proportion observed in each neighborhood:

where Ti is the observed ratio, Pi is the percentage of blacks, and Wi and Bi are the unobserved rates of the black and white subpopulations, respectively.

Algebraic manipulation gives rise to an equation that can be estimated from the aggregated data:

Equation Analysis

The constant term in the regression is the average proportion of the party vote in the white population, and β – α the estimate of the black proportion produces. The term disturbance is introduced because α and β are fixed, while in reality Wi and Bi vary from neighborhood to neighborhood. Also, the validity of this approach depends on the “assumption of constancy.” In other words, as discussed by Goodman in 1953 and 1959 and by Freedman in 2001, the proportions of the vote do not depend on the ethnic composition of the neighborhood (Henderson, Caplan and Daniel, 2004).

A second basic approach, described in a 1953 study by Duncan and Davis, is based on setting limits for the minimum and maximum possible for each cell of a cross-tabulation in each of the aggregate units. By adding these extremes over the data set, it is possible to determine with 100% confidence the minimum and maximum limits of the correlation that could be obtained in the data at the individual level.

King Solution

In 1997, King proposed a “solution” to the problem of ecological inference (EI), the EI technique. It was also developed in the context of dichotomous dependent variables. The technique combines the boundary method with Goodman’s regression technique. It also estimates the system using maximum likelihood and numerical simulation, assuming a bivariate normal distribution for the parameters. However, critics have pointed to a number of flaws in King’s technique, the review of which is beyond the scope of this essay. Anselin in 2000, Anselin and Cho in 2002, Freedman in 1998 and McCue in 2001 made important findings (Henderson, Caplan and Daniel, 2004).

King dismisses the argument that ecological inference is primarily a matter of model specification. In doing so, it reveals the most serious problem with its proposed methodology. He argues that the concept of a “correctly specified” individual-level equation is not useful in this context, since individual data contain the answer to ecological inference problems with certainty. That is, with individual data, we would not need to specify any equations. We would simply build the cross-tabulation and read the answer. Having the additional variables if individual-level data are available would not provide any additional help (p. 49). (Henderson, Caplan and Daniel, 2004).

The King Strait Approach

In other words, the narrow focus of King’s technique is to reconstruct a description of the individual data, not to evaluate a causal model. This is a fitting goal in King’s motivating example, determining race voting patterns for the purpose of a redistricting litigation.

But in virtually every other application of the social sciences, our interest is a causal model that cannot be reduced to a contingency table. Even in the analysis of votes, there are interesting substantial questions about whether racial identity affects voting without taking into account other factors, such as income, occupation, etc. In addition, King acknowledges that his method will be less effective when the dependent variable is continuous, because no information is obtained from the limits. These are quite important limitations.

The Debate on the Statistical Foundations

The debate on the statistical underpinnings and empirical performance of the EI method is likely to continue for some time. Even when the technique is being widely adopted in the field of political science. However, the most important issue in relation to King’s approach is that it develops and is justified for a very narrow range of problems. They are not fully representative of the range of issues and types of data historically associated with the ecological fallacy and the problem of ecological inference.

Our specialists wait for you to contact them through the quote form or direct chat. We also have confidential communication channels such as WhatsApp and Messenger. And if you want to be aware of our innovative services and the different advantages of hiring us, follow us on Facebook, Instagram or Twitter.

If this article was to your liking, do not forget to share it on your social networks.

Bibliographic References

Henderson A, Caplan G, Daniel A. Patient satisfaction: the Australian patient perspective. Aust Health Rev. 2004;27:73-83.

Boyer L, Francois P, Doutre E, et al. Perception and use of the results of patient satisfaction surveys by care providers in a French teaching hospital. Int J Qual Health Care. 2006;18:359-64.

Williams B. Patient satisfaction: a valid concept. Soc Sci Med. 1994;38:509-16.

You may also be interested in: Reliability and Validity

Ecological Fallacy

Ecological Fallacy