Inductive statistics (or inductive reasoning) is a branch of statistics concerned with taking samples from a larger population and using that data to: draw conclusions, make decisions, forecast, and predict future behavior.
Inferential vs inductive statistics
Inferential and Inductive Statistics Inductive statistics and inferential statistics are two terms that are used interchangeably. For example: "Inferential statistics ... is also called inductive reasoning or inductive statistics" (Jeneralczuk, 2011) "In inductive statistics the theory of probability is applied to make inferences about the process that generated the data" (Braune, nd However, there is a very subtle difference between the two terms.
Inductive statistics is the logical process of drawing general conclusions based on specific pieces of information; it is the underlying process behind the inferential statistics, as opposed to the data (statistics) produced. In other words, the branch of inferential statistics (which includes estimation and hypothesis testing) uses inductive reasoning (Steen, 2018).
Difference between descriptive and inductive statistics
Both descriptive and inductive statistics help make sense of row after row of data. Use descriptive statistics to summarize and graph data for a group of your choice. This process allows you to understand that specific set of observations. Descriptive statistics describe a sample. That is pretty straightforward. Just take a group that interests you, record data on group members, and then use summary statistics and graphs to present the group's properties.
With descriptive statistics, there is no uncertainty because you are describing only the people or items that you actually measure. You are not trying to infer properties over a larger population. The process involves taking a potentially large number of data points in the sample and reducing them to a few meaningful summary graphs and values. This procedure allows us to obtain more information and visualize the data than simply passing row after row of raw numbers
Common descriptive statistics tools
Central trend: Use the mean or median to locate the center of the data set. This measure tells you where most of the values fall.
Dispersion: How far from the center does the data extend? You can use range or standard deviation to measure dispersion. A low spread indicates that the values cluster more closely around the center. Greater spread means that the data points are farther from the center. We can also graph the frequency distribution.
Skewness - The measure tells you whether the distribution of values is symmetric or skewed.
Descriptive statistics example
Suppose we want to describe the test scores in a specific class of 30 students. We record all test scores, calculate summary statistics, and produce graphs. Taken together, this information gives us a pretty good picture of this specific class. There is no uncertainty around these statistics because we collected the scores of everyone in the class. However, we cannot take these results and extrapolate them to a larger population of students.
Inductive statistics and data
Inductive statistics takes data from a sample and makes inferences about the largest population from which the sample was drawn. Because the goal of inferential statistics is to draw conclusions from a sample and generalize them to a population, we must be confident that our sample accurately reflects the population. This requirement affects our process. At a broad level, we must do the following:
Define the population we are studying.
Take a representative sample from that population.
Use analyzes that incorporate sampling error.
We cannot choose a suitable group. Instead, random sampling allows us to be confident that the sample represents the population. Random sampling produces statistics, such as the mean, that do not tend to be too high or too low. Using a random sample, we can generalize from the sample to the broader population. Unfortunately, collecting a truly random sample can be a complicated process.
Pros and cons of working with samples
You get huge benefits from working with a random sample drawn from a population. In most cases, it is simply impossible to measure the entire population to understand its properties. The alternative is to collect a random sample and then use inferential statistics methodologies to analyze the sample data.
Usually we learn about the population by drawing a relatively small sample of it. We are a long way from measuring all the people or objects in that population. Consequently, when the properties of a population are estimated from a sample, it is unlikely that the sample statistics will exactly match the true value of the population.
For example, the sample mean is unlikely to be exactly the same as the population mean. The difference between the sample statistic and the population value is the sampling error. Inferential statistics incorporate estimates of this error into statistical results. In contrast, summary values in descriptive statistics are straightforward. The average score in a specific class is a known value because we measure all the individuals in that class. There is no uncertainty.
Inductive Statistics Standard Analysis Tools
The most common methodologies in inductive statistics are hypothesis testing, confidence intervals, and regression analysis. Interestingly, these inferential methods can produce summary values similar to descriptive statistics, such as mean and standard deviation. However, as we'll show you, we use them very differently when making inferences.
Hypothesis tests use sample data to answer questions such as the following:
Is the population mean greater or less than a particular value?
Are the means of two or more populations different from each other?
For example, if we study the effectiveness of a new drug by comparing the results in a treatment and control group, hypothesis testing can tell us whether the effect of the drug that we observe in the sample is likely to exist in the population. After all, we don't want to use the drug if it is effective only on our specific sample. Instead, we need evidence that it will be useful across the entire patient population. Hypothesis tests allow us to draw these kinds of conclusions about entire populations.
Confidence intervals (CI)
In inductive statistics, a primary goal is to estimate population parameters. These parameters are the unknown values for the entire population, such as the population mean and standard deviation. These parameter values are not only unknown, they are almost always unknowable. Normally, it is impossible to measure an entire population. The sampling error I mentioned earlier produces uncertainty, or a margin of error, around our estimates. Suppose we define our population as all high school basketball players.
Then we draw a random sample from this population and calculate the mean height of 181 cm. This sample estimate of 181 cm is the best estimate of the mean height of the population. However, it is practically guaranteed that our estimate of the population parameter is not exactly correct. Confidence intervals incorporate uncertainty and sampling error to create a range of values within which the true value of the population is similar. For example, a confidence interval of wpcodeself indicates that we can be sure that the mean of the real population is within this range.
Braune, C. (n.d.). Inductive Stats. Retrieved February 22, 2019 from: http://fuzzy.cs.ovgu.de/studium/ida/txt/ida_inductive.pdf
Jeneralczuk, J. (2011). The Three Main Aspects of Statistics. Article posted on website University of Massachusetts—Amherst. Retrieved February 22, 2019 from: http://people.math.umass.edu/~jeneral/stat240/handout1.pdf
Steen, K. Probability and Statistics, Chapter 2. Montefiore Institute. Retrieved February 27, 2018 from: http://www.montefiore.ulg.ac.be/~kvansteen/MATH0008-2/ac20112012/Class4/Chapter4_ac1112_v5a2.pdf