Item Response Theory

Sep 6, 2021 | Methodology, Quantitative Research

Item response theory (TRI), also known as latent response theory, refers to a family of mathematical models that attempt to explain the relationship between latent traits (unobservable characteristic or attribute) and their manifestations (i.e., observed results, responses, or performance). They establish a link between the properties of an instrument’s item, the individuals who respond to these items, and the underlying trait being measured. THE TRI assumes that the latent construct (e.g., stress, knowledge, or attitudes) and the items of a measure are organized into an unobservable continuum. Therefore, its main objective is to establish the position of the individual on that continuum.

Classical test theory

Classical Test Theory focuses on the same goal and before the conceptualization of TRI. It was used (and continues to be used) to predict an individual’s latent trait based on a total score observed on an instrument. In CTT, the true score predicts the level of the latent variable and the observed score. The error is normally distributed with an average of 0 and an average of 1.

Theory of the response to the item versus the classical theory of tests

Assumptions of the TRI

1) Monotonicity – The assumption indicates that as the level of the trait increases, the probability of a correct answer also increases.

2) One-dimensionality – The model assumes that there is a dominant latent trait being measured and that this trait is the driving force of the observed responses for each item of the measurement.

3) Local independence – The answers given to the different items of a test are mutually independent given a certain level of skill.

4) Invariance – We are allowed to estimate the parameters of the item from any position of the response curve to the item. Consequently, we can estimate the parameters of an item from any group of subjects who have responded to the item.

If the assumptions hold, the differences in the observation of correct answers among respondents will be due to variation in their latent trait.

Item Response Function and Item Characteristic Curve (ICC)

TRI models predict respondents’ responses to items in an instrument based on their position on the latent trait continuum and item characteristics, also known as parameters. The item response function characterizes this association. This underlying assumption is that each response to an item on an instrument provides some inclination over the individual’s level in latent trait or ability.

The person’s ability (θ), in simple terms, is the probability of giving the correct answer to that item, so the greater the individual’s capacity, the greater the probability that he will respond correctly. This relationship can be represented graphically and is known as the characteristic curve of the item. In addition, the likelihood of approving a correct answer increases monotonously as the respondent’s capacity increases. Keep in mind that, theoretically, the capacity (θ) ranges between -∞ and +∞, however, in applications, it usually ranges between -3 and + 3.

Item parameters

As people’s abilities vary, their position on the latent construct continuum changes and is determined by the sample of respondents and item parameters. An item must be sensitive enough to rate respondents within the suggested unobservable continuum.

The difficulty of the item (bi)

It is the parameter that determines the way the item behaves along the skill scale. It is determined at the point of average probability, that is, the capacity at which 50% of respondents approve of the correct answer. On a characteristic item curve, hard-to-pass items scroll to the right of the scale, indicating the greater ability of respondents who approve correctly, while the easiest ones shift further to the left of the capacity scale.

Item discrimination (ai)

Determines the rate at which the probability of passing a correct item changes based on skill levels. This parameter is essential to differentiate between individuals who possess similar levels of the latent construct of interest. The ultimate goal of designing an accurate measure is to include items with high discrimination, in order to map individuals along the latent trait continuum.

On the other hand, researchers should be careful if an item is observed to have negative discrimination, as the likelihood of approving the correct answer should not decrease as the respondent’s capacity increases. Hence, these items should be reviewed. The scale of discrimination of the items, theoretically, ranges between -∞ and +∞; and usually does not exceed 2; therefore, realistically, it ranges from (0.2)

Divination (ci)

Item divination is the third parameter that takes into account the divination of an item. It restricts the likelihood of passing the correct answer as the skill approaches -∞.

Population invariance

In simple terms, the parameters of the item behave similarly in different populations. This is not the case when following the CTT in the measurement. As the unit of analysis is the item in the TRI, the location of the item (difficulty) can be standardized (undergo a linear transformation) between populations and, therefore, the items can be easily compared. An important note to add is that, even after the linear transformation, the estimates of the parameters derived from two samples will not be identical, the invariance, as the name suggests, refers to the invariance of the population and, therefore, applies only to the parameters of the population of items.

Types of IRT models

One-dimensional models

One-dimensional models predict the capacity of items that measure a dominant latent trait.

Dichotomous IRT models

Dichotomous IRT models are used when responses to items in a measure are dichotomous (i.e., 0.1).

The 1-parameter logistic model

This model is the simplest form of the TRI models. It consists of a parameter that describes the latent trait (capacity – θ) of the person responding to the items, as well as another parameter for the item (difficulty). The following equation represents its mathematical form:

The model represents the item response function for the 1-parameter logistic model that predicts the probability of a correct response given the respondent’s capacity and the difficulty of the item. In the 1-PL model, the discrimination parameter is fixed for all items and, consequently, all the characteristic curves of the item corresponding to the different items of the measure are parallel along the skill scale.

The Rasch model versus the 1-parameter logistic models

The models are mathematically the same, however, the Rasch Model limits Item Discrimination (ai) to 1, while the logistic model of 1 parameter strives to fit the data as much as possible and does not limit the discrimination factor to 1. In the Rasch Model, the model is superior, as it is more concerned with developing the variable that is being used to measure the dimension of interest. Therefore, when constructing the fit of an instrument, the Rasch Model would be the best, improving the accuracy of the items.

The two-parameter logistic model

The two-parameter logistic model predicts the probability of a correct answer using two parameters (bi difficulty and ai discrimination).

This discrimination parameter is allowed to vary between items. Therefore, the CPI of the different items can intersect and have different slopes. The higher the slope, the greater the discrimination of the item, since it will be able to detect subtle differences in the capacity of the respondents.

The item information function

As in the case of the 1-PL model, the information is calculated as the product between the probability of a correct and an incorrect answer. However, the product is multiplied by the square of the discrimination parameter. The implication is that the higher the discrimination parameter, the greater the information provided by the item. Because the discrimination factor is allowed to vary between items, the graphs in the item information function may also look different.

Capacity estimation

With the 2-PL model, the assumption of local independence is still valid and the maximum likelihood of capacity estimate is used. Although the probabilities of response patterns continue to add up, they are now weighted by the item’s discrimination factor for each response. Therefore, their likelihood functions may differ from each other and peak at different levels of θ.

The 3-parameter logistics model

This model predicts the probability of a correct answer, in the same way as Model 1 – PL and Model 2 – PL, but is restricted by a third parameter called the divination parameter (also known as the pseudo-chance parameter), which restricts the probability of approving a correct answer when the respondent’s ability approaches -∞. As respondents respond to an item by guessing, the amount of information provided by that item decreases and the function of information items reaches its maximum level compared to other functions. In addition, the difficulty is no longer delimited by the average probability. The items that are answered by guessing indicate that the respondent’s capacity is less than his difficulty.

Model tuning

One way to choose which model to fit is to evaluate the relative fit of the model through its information criteria. The AIC estimates are compared and the model with the lowest AIC is chosen. Alternatively, we can use the Chi-square (Deviation) and measure the change in the ratio. As a chi-square distribution follows, we can estimate whether the two models are statistically different from each other.

Other models IRT

They include models that handle polytomic data, such as the graduated response model and the partial credit model. These models predict the expected score for each response category. On the other hand, other TRI models, such as nominal response models, predict the expected scores of individuals responding to items with unordered response categories (e.g., Yes, No, Maybe). In this brief summary, we have focused on one-dimensional IRT models, related to the measurement of a latent trait, however these models would not be appropriate in the measurement of more than one construct or latent trait. In the latter case, the use of multidimensional IRT models is advised.

Our specialists wait for you to contact them through the quote form or direct chat. We also have confidential communication channels such as WhatsApp and Messenger. And if you want to be aware of our innovative services and the different advantages of hiring us, follow us on Facebook, Instagram or Twitter.

If this article was to your liking, do not forget to share it on your social networks.

Bibliographic References

Hambleton, R. K., & Swaminathan, H. (1985). Item response theory principles and applications. Boston, MA: Kluwer-Nijhoff Publishing.

Embretson, Susan E., and Steven P. Reise. Item response theory. Psychology Press, 2013.

Van der Linden, W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York, NY: Springer.

DeMars C. Item Response Theory. Cary, NC, USA: Oxford University Press, USA; 2010.

You might also be interested in: A thesis proposes a new method to predict what can happen to groundwater in the event of an earthquake

Item Response Theory. Photo: Unsplash. Credits: Lilartsy

Item Response Theory

Classical test theory