Reliability and validity are concepts used to assess the quality of research. They indicate how well a method, technique, or test measures something. Reliability refers to the consistency of a measure, and validity to the accuracy of a measure.
It is important to consider reliability and validity when creating research design, planning methods, and writing results, especially in quantitative research.
Reliability and validity are closely related, but they mean different things. A measurement can be reliable without being valid. However, if a measurement is valid, it is usually also reliable.
What is reliability?
According to Russ-Eft (1980), reliability refers to the constancy with which a method measures something. If the same result can be obtained consistently using the same methods under the same circumstances, the measurement is considered reliable.
The temperature of a liquid sample is measured several times under identical conditions. The thermometer always shows the same temperature, so the results are reliable.
A doctor uses a symptom questionnaire to diagnose a patient with a long-term illness. Several different doctors use the same questionnaire with the same patient but give different diagnoses. This indicates that the questionnaire is unreliable as a measure of disease.
What is validity?
Validity refers to the accuracy with which a method measures what it is intended to measure. If the research has a high validity, it means that it produces results that correspond to the real properties, characteristics and variations of the physical or social world.
Relationship between Reliability and Validity
High reliability is an indicator that a measurement is valid. If a method is unreliable, it is probably invalid.
For example, if the thermometer shows different temperatures each time, even if the conditions have been carefully controlled to ensure that the temperature of the sample is the same, it is likely that the thermometer will malfunction and therefore its measurements are not valid.
If a symptom questionnaire results in a reliable diagnosis when answered at different times and with different doctors, this indicates that it has great validity as a measure of medical condition.
However, reliability alone is not enough to guarantee validity. Even if a test is reliable, it may not accurately reflect the actual situation.
Measuring Reliability and Validity
A group of participants performs a test designed to measure working memory. The results are reliable, but the participants’ scores are strongly correlated with their level of reading comprehension. This indicates that the method could have low validity: the test may be measuring participants’ reading comprehension rather than their working memory.
Validity is harder to assess than reliability, but it’s even more important. To get useful results, the methods you use to collect the data must be valid: the research must measure what it claims to measure. This ensures that the discussion of the data and the conclusions drawn are also valid.
When conducting quantitative research, the reliability and validity of research methods and measuring instruments must be taken into account.
Types of Reliability
Reliability indicates the consistency with which a method measures something. When the same method is applied to the same sample under the same conditions, the same results should be obtained. If not, the measurement method may be unreliable.
According to Yang (1995), there are four main types of reliability. Each of them can be estimated by comparing different sets of results produced by the same method.
Test-retest reliability measures the consistency of results when the same test is repeated in the same sample at a different time. It is used when measuring something that is expected to remain constant in the sample.
A color blindness test for aspiring trainee pilots should have high test-retest reliability, because color blindness is a trait that does not change over time.
Why it’s important
There are many factors that can influence results at different times: for example, respondents may experience different moods or external conditions may affect their ability to respond accurately.
The reliability of the test-retest can be used to assess the resistance of a method to these factors over time. The smaller the difference between the two sets of results, the greater the test-retest reliability.
How it is measured
To measure test-retest reliability, the same test is performed on the same group of people at two different times. The correlation between the two sets of results is then calculated.
Example of test-retest reliability
A questionnaire is designed to measure the IQ of a group of participants (a property that is unlikely to change significantly over time). The test is administered two months apart to the same group of people, but the results are significantly different, so the test-retest reliability of the IQ questionnaire is low.
How to Improve the Reliability of the Retrospection Test
When designing tests or quizzes, try to formulate questions, statements, and tasks in a way that is not influenced by participants’ mood or concentration.
When planning data collection methods, try to minimize the influence of external factors and ensure that all samples are subjected to the same conditions.
Remember that changes are to be expected to occur in participants over time, and keep them in mind.
Reliability between evaluators
Collation reliability (also called interobserver reliability) measures the degree of agreement between different people observing or evaluating the same thing. It is used when data is collected by researchers who assign ratings, scores or categories to one or more variables.
In an observational study in which a team of researchers collects data on classroom behavior, interobserver reliability is important: all researchers must agree on how to categorize or rate different types of behavior.
Why it’s important
People are subjective, so the perceptions of different observers about situations and phenomena differ naturally. Reliable research aims to minimize subjectivity as much as possible, so that another researcher can replicate the same results.
When designing the scale and criteria for data collection, it is important to ensure that different people will rate the same variable consistently with minimal bias. This is especially important when there are several researchers involved in data collection or analysis.
How to measure it
To measure reliability between evaluators, different researchers perform the same measurement or observation on the same sample. Next, the correlation between your different sets of results is calculated. If all researchers give similar grades, the test has high reliability among evaluators.
Example of reliability between evaluators
A team of researchers observes the progress of wound healing in patients. To record the stages of healing, assessment scales are used, with a set of criteria to evaluate various aspects of the wounds. The results of different researchers evaluating the same set of patients are compared, and there is a strong correlation between all sets of results, so the test has high reliability among evaluators.
Improved reliability between testers
Clearly define your variables and the methods that will be used to measure them.
Develop detailed and objective criteria for how variables will be graded, counted, or categorized.
If there are multiple researchers involved, make sure they all have exactly the same information and training.
Reliability of parallel shapes
The reliability of parallel shapes measures the correlation between two equivalent versions of a test. It is used when there are two different assessment tools or sets of questions designed to measure the same thing.
Why it’s important
If you want to use several different versions of a test (for example, to prevent respondents from repeating the same answers from memory), you first need to make sure that all sets of questions or measurements give reliable results.
In educational assessment, it is often necessary to create different versions of the tests to ensure that students do not have access to the questions beforehand. The reliability of parallel shapes means that if the same students take two different versions of a reading comprehension test, they should get similar results on both tests.
How it is measured
The most common way to measure the reliability of parallel shapes is to come up with a large set of questions to evaluate the same thing, and then randomly divide them into two sets of questions.
The same group of respondents responds to both sets and the correlation between the results is calculated. A high correlation between the two indicates a high reliability of parallel forms.
Example of parallel form reliability
A set of questions is asked to measure financial risk aversion in a group of respondents. The questions are randomly divided into two sets, and the respondents are randomly divided into two groups. Both groups perform the two tests: group A performs test A first and group B tests B. The results of the two tests are compared and the results are almost identical, indicating a high reliability of the parallel shapes.
How to improve the reliability of parallel forms
Make sure that all test questions or items are based on the same theory and are formulated to measure the same thing.
Internal consistency evaluates the correlation between several items of a test that aim to measure the same construct.
It can be calculated without the need to repeat the test or involve other researchers, so it’s a good way to assess reliability when only one dataset is available.
Why it’s important
When designing a set of questions or ratings that will be combined into an overall score, you have to make sure that all items actually reflect the same thing. If the answers to the different items contradict each other, the test may not be reliable.
To measure customer satisfaction with an online store, you could create a questionnaire with a set of statements that respondents should agree or disagree with. Internal consistency tells you if all claims are reliable indicators of customer satisfaction.
How to measure it
Two common methods are used to measure internal consistency.
Mean correlation between elements: For a set of measures designed to evaluate the same construct, the correlation between the results of all possible pairs of items is calculated and then the mean is calculated.
Reliability by halves: A set of measures is randomly divided into two sets. After testing the whole set with the respondents, the correlation between the two sets of responses is calculated.
Example of internal consistency
A group of respondents is presented with a set of statements designed to measure the optimistic and pessimistic mindset. They must rate their agreement with each statement on a scale of 1 to 5. If the test is internally consistent, an optimistic respondent should generally give high marks to indicators of optimism and low to those of pessimism. The correlation between all responses to “optimistic” statements is calculated, but the correlation is very weak. This suggests that the test has a low internal consistency.
Improve internal consistency
Care must be taken when devising questions or measures: those that aim to reflect the same concept must be based on the same theory and carefully formulated.
What kind of reliability applies to my research?
According to Fink (1995), it is important to consider reliability when planning research design, collecting and analyzing data, and writing research. The type of reliability you should calculate depends on the type of research and your methodology.
How to ensure the validity and reliability of your research
The reliability and validity of the results depends on the creation of a solid research design, the choice of appropriate methods and samples, and the conduct of research in a careful and consistent manner.
If scores or rankings are used to measure variations in something (such as psychological traits, ability levels, or physical properties), it is important that the results reflect the actual variations as accurately as possible. Validity should be taken into account in the early stages of the investigation, when deciding how the data is to be collected.
Choose appropriate measurement methods
Make sure the measurement method and technique are of high quality and geared toward measuring exactly what you want to know. They should be thoroughly researched and based on existing knowledge.
For example, to collect data on a personality trait, you can use a standardized questionnaire that is considered reliable and valid. If you develop your own questionnaire, it should be based on established theory or the results of previous studies, and the questions should be written with care and precision.
Use appropriate sampling methods to select subjects
For valid and generalizable results, clearly define the population you are researching (e.g., people of a specific age range, geographic location, or profession). Make sure that you have a sufficient number of participants and that they are representative of the population.
Reliability must be taken into account throughout the data collection process. When using a tool or technique to collect data, it is important that the results are accurate, stable and reproducible.
Apply your methods consistently
Plan your method carefully to make sure you perform the same steps in the same way on each measurement. This is especially important if several researchers are involved.
For example, if you are conducting interviews or observations, clearly define how specific behaviors or responses will be counted, and make sure the questions are phrased the same way each time.
Standardize the conditions of your research
When collecting data, keep circumstances as consistent as possible to reduce the influence of external factors that may create variations in results.
For example, in an experimental assembly, make sure that all participants receive the same information and undergo the same conditions.
Where to write about reliability and validity in a thesis
It is convenient to talk about reliability and validity in various sections of the thesis or dissertation. Demonstrating that you’ve taken them into account when planning your research and interpreting the results makes your work more credible and trustworthy.
Our specialists wait for you to contact them through the quote form or direct chat. We also have confidential communication channels such as WhatsApp and Messenger. And if you want to be aware of our innovative services and the different advantages of hiring us, follow us on Facebook, Instagram or Twitter.
If this article was to your liking, do not forget to share it on your social networks.
You might also be interested in: Ecological Fallacy
Fink, A., ed. (1995). The survey Handbook, v.1.Thousand Oaks, CA: Sage.
Russ-Eft, D. F. (1980). Validity and reliability in survey research. American Institutes for Research in the Behavioral Sciences August, 227-151.
Yang, G. H., et al. (1995). Experimental and quasi-experimental educational research. Diss. Colorado State University.