By comparing predictions with real data, we can determine to what extent deduction and inference explain the data and therefore the phenomenon. This can lead us to completely reject some models, to improve (and then reevaluate) others, and perhaps finally to declare one as the “best” model (so far). Models are built using accepted theoretical principles, prior knowledge and expert judgment.
What is the Deduction?
Deduction can be defined as the process of drawing conclusions based on evidence and reasoning. It lies at the heart of the scientific method, as it encompasses the principles and methods by which we use data to learn about observable phenomena. This is always done through models. Much of the science is modeled, which means we build a model of some phenomenon and use it to make predictions of the data we expect to observe under certain conditions.
What is Inference?
According to Popper (2005), inference is the process by which models are compared with data. It is usually a matter of merging the model mathematically and using the principles of probability to quantify the quality of the match.
Example of Application of Deduction and Inference
As an example, let’s consider, somewhat anachronistically, the models of motion of planets in the solar system as seen from the Earth’s surface. Two outstanding alternatives are:
(1) The geocentric model, in which the Earth is considered fixed and the Sun, Moon and planets orbit around it
(2) The heliocentric model, in which the Sun is considered fixed and the Earth and the planets orbit around it (the Moon continues to orbit the Earth).
Ignoring the specific details of the various flavors of these general models (Aristotelian, Ptolemaic, Copernican, Keplerian, etc.), there is actually a form of equivalence of the two models, because one can be transformed into the other by a transformation of non-inertial coordinates. But when defined in their natural coordinates, these two models look very different.
How to: Inference
Inference can normally be divided into two parts: model tuning and model comparison. In one version of the geocentric model, planets move in regular circular orbits with the Sun in the center. Several parameters describe the motion of each planet (radius, period, inclination, phase). Model tuning is the process by which the values of these parameters are determined from a set of observational data.
As all data has more or less noise, this implies an uncertainty, which is best quantified using probability. That is, the uncertainty in the data corresponds to the uncertainty in the parameters of the model. The most general method of inference is to determine the probability distribution function (PDF), P(θ | D, M), where D denotes the data and θ the parameters of the model M. This is usually a multidimensional PDF and cannot usually be evaluated analytically.
The Monte Carlo Method
According to Medewar (1996), the inference arsenal includes numerous numerical tools to evaluate it, with the Markov Chain Monte Carlo (MCMC) methods being one of the most popular. However, this takes a long time and sometimes approximate techniques are employed to determine summary information (such as mean and covariance). Depending on the quantity and quality of the data, as well as the suitability of the model, P(θ | D, M) may have a more or less high peak around a narrow range of parameters. This indicates a well-determined, low-uncertainty solution for that model.
Usually, we will want to know how good a model is, either in a general sense or in some specific values adjusted from the parameters. Actually, this is a poorly posed question if we only consider one model, because then we have no alternative that can better explain the data, so any data must be attributed to our only model. So, we actually always compare models and try to identify the “best” one (according to some criteria).
At the very least, we consider a “background” model as an implicit alternative. If we take the example of the detection of an emission line in a spectrum in which we only have one model for the location and shape of the line, an implicit alternative model is the absence of a line, for example, only a constant spectrum. But often we will have other alternative models, for example, with multiple lines, or lines with different shapes.
Quantifying the quality of both models
We rarely believe that our models are perfect, so it doesn’t really make sense to ask if a particular model is the “right” one. This is reinforced by the fact that the data is noisy – they have a random component that cannot be predicted by a model – so we expect some deviation between the data and the predictions. Therefore, we can only quantify the relative quality of the models (including a possible background model). We cannot establish the absolute quality of a model. By the way, the comparison of models is often done poorly in the literature. Partly because of the use of background models that are too simplified and whose weakness helps to artificially promote almost any other model.
The Second Pillar of Inference
This brings us to the second pillar of inference, model comparison, which aims to identify which of the models considered best explains the data. Returning to the example of planetary orbits, consider that we have a set of observations of planetary positions (two-dimensional celestial coordinates) on known dates.
A good geocentric model can predict most possible observations very well. Better, in fact, than a heliocentric model with circular orbits. More generally, since we can geometrically transform the predictions of a heliocentric model into those of a geocentric model, both models could be equally good at predicting this data.
By making a geocentric model more complex (adding more epicycles) we make it fit the data better and better. Think about fitting a curve to ten points in a two-dimensional space: Unless the points are collinear, a cubic curve can always fit better—in terms of the sum of square residues—than a straight line.) If we have additional reasons for preferring a geocentric model (such as lack of stellar parallax, Aristotle, or biblical interpretations), then the geocentric model seems to be favored.
Consideration of the Plausibility of models
However, there is something important missing from this chain of reasoning. We know that increasingly complex models can be made to fit any dataset, but we consider such models to be increasingly contrived. Therefore, in addition to the predictability, the consideration of the plausibility of the models must be a fundamental part of the inference. Verisimilitude is often equated (though not always) with parsimony, in which case we adopt what is often called Occam’s razor. We should prefer a simple solution when a more complicated one is not necessary.
Therefore, we should apply some kind of “complexity control” to our models. As for the historical development of theories of planetary motion, the change of preference from a geocentric model to a heliocentric one was not due solely to the improvement of the data.
It was also due to a greater willingness to question the assumption that the Earth could not move, as well as the choice to give more weight to the fact that the movement of the Sun is suspiciously synchronized with the movement of the planets in the geocentric model. The former was the consequence of an intellectual revolution that went beyond astronomy. The second was essentially an argument of plausibility. Both demonstrate the inescapable importance of previous information for the inference process, that which goes beyond the data that we explicitly use in modelling.
The above description of the inference is Bayesian. It is the only logical and self-consistent approach to probability-based inference. A probabilistic approach is essential, because dealing with observational data means dealing with uncertainty, the data is noisy (we can’t accurately measure the positions of the planets), and our samples are incomplete (we can’t measure every point in an orbit).
Probability is undoubtedly the most potent means of dealing with uncertainty. Some scientists oppose a priori probabilities, but this does not mean denying their existence. It only highlights the difficulty involved in practice in encapsulating information a priori in terms of probabilities, which is a scientific problem that must be addressed, not shunned.
Bayesian inference has experienced a major renaissance in astronomy over the past twenty years. This is due, in part, to the increase in the available computing power, as high-dimensional numerical integration often has to be performed. Once the methods for doing this became manageable, astronomers realized the need for logical, self-consistent data analysis, as opposed to quick and simple, but often erroneous, statistical tests or recipes.
Importance of making the inference correctly
Making inference correctly is an area of vital importance in all of science, but particularly in astronomy, where, not being able to conduct experiments or obtain data in situ, we are limited to remote observations. Enormous amounts of money, time and effort are invested in building powerful instruments. Thus, a proportionate effort should be made to ensure that sensible things are done with the data. Unfortunately, this is not always the case, and many publications in the literature draw incorrect conclusions due to faulty inference.
This is not always due to ignorance or even lack of effort. The principles of inference may be simple, but the practice is much more complex: What models should I take into account? How do I configure and parameterize these models? What data do I take into account? What is the right noise model? How do I define my previous values and check my sensitivity to them? And the complexity or plausibility of the model? How do I explore a later high-dimensional PDF on an acceptable timescale? What new data should I acquire to help distinguish between the models I’ve tested? How do I use the results of the analysis to improve the models or propose new ones? These are questions that can only be answered in the context of specific problems, and which are at the heart of applied research on inference.
Inference: Populations and samples
The basis of hypothesis testing with statistical analysis is inference. Simply put, inference—and inferential statistics by extension—means gaining insights into a population from a sample of that population. Since in most contexts it is not possible to have all the data of an entire population of interest, we have to take a sample of that population. However, in order to rely on inference, the sample must cover theoretically relevant variables, variable ranges, and contexts.
Populations and samples
By performing a statistical analysis, we differentiate between populations and samples. The population is the total set of elements that interest us. The sample is a subset of those elements that we study to understand the population. Although we are interested in the population, we often have to resort to the study of a sample due to time, financial or logistical limitations that could make the study of the entire population unfeasible. Instead, we use inferential statistics to make inferences about the population from a sample.
Sampling and knowledge
Let’s take a relatively common —but perhaps less examined—expression about what we “know” about the world around us. We often say we know people, and some of us know them better than others. What does it mean to meet someone? In part it must mean that we can anticipate how that person would behave in a wide range of situations. If we know that person from experience, then it must be that we have observed his behavior in a sufficient variety of situations in the past to be able to infer how he would behave in future situations.
In other words, we have “sampled” their behavior in a relevant range of situations and contexts to trust that we can anticipate their behavior in the future. Similar sampling considerations could apply to “knowing” a place, a group or an institution. Of equal importance are the samples of observations through different combinations of variables, necessary to identify the relationships (or functions) between the variables. In short, the samples – whether deliberate and systematic or of another kind – are part of what we think we know about the world around us.
Given the importance of sampling, it should come as no surprise that there are numerous strategies designed to provide useful inference about populations. For example, how can we judge whether the temperature of a soup is adequate before serving? We could stir the pot, to ensure temperature uniformity in possible samples (the size of a spoon), and then take a sample of one tablespoon.
An especially thorny problem in sampling is the practice of courtship, in which participants can try to do their best to make a good impression. In other words, participants often try to skew the sample of relational experiences to look better than average.
What does sampling involve?
Sampling in this context usually involves:
(a) Obtain opinions from others, thereby extending (if only indirectly) the sample size
(b) Observe the courtship partner in a wide range of circumstances in which it may be difficult to maintain the intended bias.
Put formally, we can attempt to stratify the sample by taking observations in appropriate “cells” that correspond to different potential influences on behavior. For example, high-stress environments that involve preparing for final exams or meeting with parents.
However, at best, we try to eliminate the effect of different influences on our samples by randomizing. Following the example of courtship, behavioral observations could be taken through the interactions of a series of pairs and randomly assigned situations. But of course, by then all bets are out of place to make things work anyway.
When performing inferential statistics to infer about the characteristics of a population from a sample, it is essential to be clear about how the sample has been extracted. Sampling can be a very complex practice with multiple stages for obtaining the final sample. It is desirable that the sample be some form of probabilistic sample, that is, a sample in which each member of the population has a known probability of being sampled.
The most direct form of a suitable probabilistic sample is a random sample in which everyone has the same probability of being included in the sample. A random sample has the advantages of simplicity (in theory) and ease of inference, since no adjustments need to be made to the data. However, the reality of conducting a random sample can make the process quite difficult.
Sample Example 1
Before we can extract subjects at random, we need a list of all members of the population. For many populations (e.g., U.S.-based adults) that list is impossible to obtain. Not so long ago, it was reasonable to conclude that a list of phone numbers was a reasonable approximation to that list for American households.
During the time when landlines were ubiquitous, pollsters could call random numbers (and perhaps ask for the adult in the household who had turned older) to get a good approximation of a national random sample. It was also a time before caller identification and specialized ringtones. This meant that calls were answered routinely, thus decreasing – but not eliminating – concern about response bias. Of course, phone habits have changed and pollsters find it increasingly difficult to argue that random dialing of landlines serves as a representative sample of adult Americans.
Sample Example 2
Other forms of probabilistic sampling are often used to overcome some of the difficulties presented by pure random sampling. Suppose our analysis requires us to make comparisons based on race. Only 12.6% of Americans are African American.
Suppose we also want to take into account religious preferences. Only 5% of African Americans are Catholic, meaning that only 0.6% of the population is of both religions. If our sample size is 500, we could end up with three African-American Catholics. A stratified random sample (also called a quota sample) can solve that problem. A stratified random sample is similar to a simple random sample, but will be extracted from different subpopulations, strata, in different percentages. Therefore, the total sample must be weighted to be representative of the entire population.
Another type of probabilistic sampling common in face-to-face surveys is cluster sampling. According to Dorland’s Illustrated Medical Dictionary (1994), cluster sampling initially samples on the basis of clusters (usually geographic units, such as census tracts) and then samples participants within those units. In fact, this approach typically uses multi-level sampling, in which the first level may be a sample of congressional districts, then census tracts, and then households. The final sample will have to be weighted in a complex way to reflect the different probabilities that individuals will be included in the sample.
Non-probabilistic samples, or those for which the probability of inclusion of a member of the population in the sample is unknown, can pose difficult problems for statistical inference. However, under some conditions, they can be considered representative and used for inferential statistics.
Convenience samples (e.g., undergraduate students in the Department of Psychology’s subject group) are accessible and relatively inexpensive, but may differ from the larger population you want to infer in important respects. The need may push a researcher to use a convenience sample, but the inference should be approached with caution. A convenience sample based on “I asked people who left the bank” could provide very different results than a sample based on “I asked people who left a payday loan establishment.”
Some non-probabilistic samples are used because the researcher does not want to make inferences to a larger population. An intentional or judgmental sample is based on the discretion of the researcher as to who can provide useful information on the subject. If we want to know why a law was enacted, it makes sense to take a sample from the author and co-authors of the bill, committee members, leaders, etc., rather than a random sample of legislative body members.
Snowball sampling is similar to intentional sampling in that we look for people with certain characteristics, but we trust that subjects will recommend us to others who meet the criteria we have established. We may want to meet young artists with difficulties. However, it can be difficult to find them, as their works do not hang in galleries, so we can start with one or several that we find and then ask them who else we should interview.
Sampling techniques can be relatively straightforward, but as one moves away from simple random sampling, the sampling process becomes more complex or limits our ability to make inferences about a population. Researchers use all of these techniques to good purposes and the best technique will depend on a number of factors, such as budget, experience, the need for accuracy, and the research question being addressed. However, in what remains of this text, when we talk about making inferences, the data will be based on a properly extracted probabilistic sample.
Our specialists wait for you to contact them through the quote form or direct chat. We also have confidential communication channels such as WhatsApp and Messenger. And if you want to be aware of our innovative services and the different advantages of hiring us, follow us on Facebook, Instagram or Twitter.
If this article was to your liking, do not forget to share it on your social networks.
You might also be interested in: Generalization and Transferability
Medewar P. (1996) Is the scientific paper a fraud? In: The Strange Case of the Spotted Mice and Other Essays. Cambridge: Oxford U. Press.
Popper KR. (2005) The Logic of Scientific Discovery. New York: Basic Books.
Dorland’s Illustrated Medical Dictionary (1994). 28th Edition. Philadelphia: WB Saunders.