Validity, Reliability, Verification, Authority, and Trustworthiness

Categories: Authority Psychology Research

Essay, Pages 7 (1636 words)

Views

690

Rigor in the advancement of behavioral/social inquiry is a process/aspiration that requires continuous refinement. For researchers, as for all humans, there is always the possibility that the conceptual categories, subcategories, or measurements of a behavioral/social inquiry don’t represent reality very well or that the process of applying the categories or measuring instruments to the empirical world is to some degree faulty.

Hence, in the social/behavioral scientific process, the examination of these issues is concerned with the degree of clarity, focus, integrity, rigor, utility, verifiability, and thus validity, reliability, and precision of qualitative and quantitative investigation.

Don't use plagiarized sources. Get your custom essay on

“ Validity, Reliability, Verification, Authority, and Trustworthiness ”

Get custom paper

NEW! smart matching with writer

Clarity, focus, integrity, rigor, utility, verifiability, validity, reliability, and precision are all central issues in all measurement. All of these concern how concrete measures are connected to constructs.

Clarity, focus, integrity, rigor, utility, verifiability, validity, reliability, and precision are salient because constructs in social theory are often ambiguous, diffuse, and not directly observable. Perfect clarity, focus, integrity, rigor, utility, verifiability, validity, reliability, and precision are virtually impossible to achieve.

Rather, they are ideals researchers strive for. All behavioral/social researchers want their measures to have rigor. This is important in establishing the clarity, focus, integrity, utility, verifiability, and thus validity, reliability, and precision of findings.

Rigor may also have multiple meanings. Here, they refer to related, desirable aspects of measurement. Reliability means integrity or consistency. It suggests that the same thing is repeated or recurs under the identical or very similar conditions. The opposite of reliability is a measurement that process yields erratic, unstable, or inconsistent results.

Validity suggests clarity and focus and refers to the match between a construct, or the way a researcher verify the idea in a conceptual definition, and a measure. It refers to how well an idea about reality “fits” with actual reality.

The absence of validity occurs if there is poor fit between the constructs a researcher uses to describe, theorize, or analyze the social world and what actually occurs in the social world. In simple terms, validity address the question of how focused the social reality being measured through research matches with the constructs researchers use to understand it. Qualitative and quantitative researchers want reliable and valid measurement, but beyond an agreement on the basic ideas at a general level, each style sees the specifics of reliability and validity in the research process differently.

Reliability and Validity in Quantitative Research Reliability. As just stated, reliability means integrity or consistency. It means that the numerical results produced by an indicator do not vary because of characteristics of the measurement process or measurement instrument itself. For example, I get on my bathroom scale and read my weight. I get off and get on again and again. I have a reliable scale if it gives me the same weight each time—assuming, of course, that I am not eating, drinking, changing clothing, and so forth.

An unreliable scale will register different weights each time, even though my “true” weight does not change. Another example is my car speedometer. If I am driving at a constant slow speed on a level surface, but the speedometer needle jumps from one end to the other, my speedometer is not a reliable indicator of how fast I am traveling. Actually, there are three types of Reliability (Carmines & Zeller, 1999). Three Types of Reliability Stability Reliability. Stability reliability is reliability across time. It addresses the question: Does the measure deliver the same answer when applied in different time periods?

The weight-scale example just given is of this type of reliability. You can examine an indicator’s degree of stability reliability by using the test-retest method, with which you retest or readminister the indicator to the same group of people. If what you are measuring is stable and the indicator has stability reliability, then you will get the same results each time. This is otherwise known as verifiability. A variation of the test-retest method is to give an alternative form of the test, but the alternative form has to be very similar. For example, I have a hypothesis

about gender and seating patterns in a college cafeteria. I measure my dependent variable (seating patterns) by observing and recording the number of male and female students at tables, and noting who sits down first, second, third, and so on for a three-hour period. If, as I am observing, I get tired or distracted, or I forget to record and miss more people toward the end of the three hours, then my indicator does not have a high degree of stability reliability. Representative Reliability. Representative reliability is reliability across subpopulations or groups of people.

It addresses the question: Does the indicator deliver the same answer when applied to different groups? An indicator has high representative reliability if it yields the same result for a construct when applied to different subpopulations (e. g. , different classes, races, sexes, age groups, etc. ). For example, I ask a question about a person’s age. If people in their twenties answered my question by overstating their true age, whereas people in their fifties understated their true age, then the indicator has a low degree of representative reliability.

To have representative reliability, the measure needs to give accurate information for every age group. Hence, this is accuracy (Bohrnstedt, 1992a). Equivalence Reliability. Equivalence reliability applies when researchers use multiple indicators— that is, when multiple specific measures are used in the operationalization of a construct (e. g. , several items in a questionnaire all measure the same construct). It addresses the question: Does the measure yield consistent results across different indicators? If several different indicators measure the same construct, then a reliable measure gives the same result

with all indicators. Researchers examine equivalence reliability on examinations and long questionnaires with the split half method. This involves dividing the indicators of the same construct into two groups, usually by a random process, and determining whether both halves give the same results. For example, I have 14 items on a questionnaire. All measure political conservatism among college students. If my indicators (i. e. , questionnaire items) have equivalence reliability, then I can randomly divide them into two groups of 7 and get the same results.

For example, I use the first 7 questions and find that a class of 50 business majors is twice as conservative as a class of 50 education majors. I get the same results using the second 7 questions. There are also special statistical measures (e. g. , Cronbach’s alpha) to determine this type of reliability (Carmines & Zeller, 1999). A special type of equivalence reliability, intercoder reliability, arises when there are several observers, raters, or coders of information. In a sense, each person who is observing is an indicator. A measure is reliable if the observers, raters, or coders agree with each other.

It is a common type of reliability reported in content analysis studies, but it can be used whenever multiple raters or coders are involved. For example, I hire six students to observe student seating patterns in a cafeteria. If all six are equally skilled at observing and recording, I can combine the information from all six into a single reliable measure. But if one or two students are lazy, inattentive, or sloppy, then my measure will have lower reliability. Intercoder reliability is tested by having several coders measure the exact same thing, then comparing the measures.

For instance, I have three coders independently code the seating patterns during the same hour on three different days. I compare the recorded observations. If they agree, I can be confident of my measure’s intercoder reliability. Special statistical techniques measure the degree of intercoder reliability (Bohrnstedt, 1992a). Validity. Validity is an overused term. Sometimes, it is used to mean “clear” or “focus. ” There are several general types of validity. Here, we are concerned with measurement validity. There are also several types of measurement validity.

When a researcher says that an indicator is valid, it is valid for a particular purpose and definition. The same indicator can be valid for one purpose (i. e. , a research question with units of analysis and universe) but less valid or invalid for others. For example, the measure of morale discussed here (e. g. , questions about feelings toward school) might be valid for measuring morale among teachers but invalid for measuring the morale of police officers (Costner, 1985). At its core, measurement validity refers to how clear the conceptual and operational definitions mesh with each other.

The better the fit, the greater the measurement validity. Validity is more difficult to achieve than reliability. We cannot have absolute confidence about validity, but some measures are more valid than others. The reason we can never achieve absolute validity is that constructs are abstract ideas, whereas indicators refer to concrete observation. This is the gap between our mental pictures about the world and the specific things we do at particular times and places. Bohrnstedt (1992b:2217) argued that validity cannot be determined directly.

Validity is part of a dynamic process that grows by accumulating evidence over time, and without it, all measurement becomes meaningless. Some researchers use rules of correspondence to reduce the gap between abstract ideas and specific indicators. They are logical statements about the fit between indicators and definitions. For example, a rule of correspondence is: If a teacher agrees with statements that “things have gotten worse at this school in the past five years” and that “there is little hope for improvement,” this indicates low morale on the part of the teacher.

Another way of talking about measurement validity is the epistemic correlation. This refers to a hypothetical correlation between a specific indicator and the essence of the construct that the indicator measures. We cannot measure such correlations directly because correlations between a measure and an abstraction are impossible, but they can be estimated with advanced statistical techniques (Zeller, & Carmines, 1980).