Construct validity is best defined as the extent to which

In discussing language test validity at this point in time, I would be remiss to not at least mention Messick's (1988, 1989) thinking about validity. Messick presented a unified and expanded theory of validity, which included the evidential and consequential bases of test interpretation and use. Table 1 shows how this theory works. Notice that the evidential basis for validity includes both test score interpretation and test score use. The evidential basis for interpreting tests involves the empirical study of construct validity, which is defined by Messick as the theoretical context of implied relationships to other constructs. The evidential basis for using tests involves the empirical investigation of both construct validity and relevance/utility, which are defined as the theoretical contexts of implied applicability and usefulness.

[ p. 9 ]

Test InterpretationTest UseEvidential BasisConstruct ValidityConstruct Validity + Relevance and UtilityConsequential BasisValue ImplicationsSocial Consequences
Table 1. Facets of test validity according to Messick


The consequential basis of validity involves both test score interpretation and test score use. The consequential basis for interpreting tests requires making judgments of the value implications, which are defined as the contexts of implied relationships to good/bad, desirable/undesirable, etc. score interpretations. The consequential basis for using tests involves making judgments of social consequences, which are defined as the value contexts of implied consequences of test use and the tangible effects of actually applying that test. The value implications and social consequences issues have special importance in Japan, where the values underlying tests like the university entrance exams and the social consequences of their use are so omnipresent in educators minds. (For more information on this model of validity, see Messick, 1988, 1989; for some interesting discussions of the consequential aspects of validity, see Green, 1998; Linn, 1998; Lune, Parke, & Stone, 1998; Moss, 1998; Reckase, 1998; Taleporos, 1998; and Yen, 1998.)

[ p. 11 ]

Clearly then, while construct validity is still an important concept, our responsibilities as language testers appear to have expanded considerably with Messick's call for test developers to pay attention to the evidential and consequential bases for the use and interpretation of test scores.

References

Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: Prentice Hall Regents.

Green, D. R. (1998). Consequential aspects of the validity of achievement tests: A publisher's point of view. Educational Measurement, 17 (2),16-19.

Linn, R. L. (1998). Partitioning responsibility for the evaluation of the consequences of assessment programs. Educational Measurement, 17 (2), 28-30

Lune, S., Parke, C. S., & Stone, C. A. (1998). A framework for evaluating the consequences of assessment programs. Educational Measurement, 17 (2), 24-28.

Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of measurement. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 33-45). Hillsdale, NJ: Lawrence Erlbaum Associates.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed.) (pp. 13-103). New York: Macmillan.

Moss, P. A. (1998). The role of consequences in validity theory. Educational Measurement, 17(2), 6-12.

Reckase, M. D. (1998). Consequential validity from the test developer's perspective. Educational Measurement, 17 (2), 13-16.

Taleporos, E. (1998). Consequential validity: A practitioner's perspective. Educational Measurement, 17 (2), 20-23.

Yen, W. M. (1998). Investigating the consequential aspects of validity: Who is responsible and what should they do? Educational Measurement, 17 (2), 5-6.

The survey researcher hypothesizes that the new measure correlates with one or more measures of a similar characteristic (convergent validity) and does not correlate with measures of dissimilar characteristics (discriminant validity). For example, a survey researcher who is validating a new quality-of-life survey might posit that it is highly correlated with another quality-of-life instrument, a measure of functioning, and a measure of health status. At the same time, the survey researcher would hypothesize that the new measure does not correlate with selected measures of social desirability (the tendency to answer questions so as to present yourself in a more positive light) and of hostility.

2.

The survey researcher hypothesizes that the measure can distinguish one group from the other on some important variable. For example, a measure of compassion should be able to demonstrate that people who are high scorers are compassionate but that people who are low scorers are unfeeling. This requires translating a theory of compassionate behavior into measurable terms, identifying people who are compassionate and those who are unfeeling (according to the theory), and proving that the measure consistently and correctly distinguishes between the two groups.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780080448947002967

Measures of Coping for Psychological Well-Being

Katharine H. Greenaway, ... Deborah J. Terry, in Measures of Personality and Social Psychological Constructs, 2015

Construct/Factor Analytic

Construct validity of the CISS was established in Endler and Parker’s (1994) original assessment. In this study, 832 college students and 483 adult community members completed the CISS, and a principal components analysis was conducted to determine the CISS structure. In both samples, there was support for a three-component structure. The task-oriented items loaded uniquely on one component, the emotion-focused items uniquely on another component, and with the exception of two items in the adult community sample, the avoidance items loaded uniquely on a third component. This structure has since been replicated in a range of other populations, including health professionals (Cosway et al., 2007) and patients with major depressive disorder (McWilliams, Cox, & Enns, 2003).

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123869159000127

Assessment with the Differential Ability Scales

BRUCE GORDON, COLIN D. ELLIOTT, in Handbook of Psychoeducational Assessment, 2001

Validity

Construct validity for the DAS is supported by confirmatory and exploratory factor analyses supporting a 1-factor model at the lower preschool level, a 2-factor (verbal/nonverbal) model at the upper preschool level, and a 3-factor (verbal/nonverbal/spatial) model for school-age children (Elliott, 1990c). Keith's (1990) independent hierarchical confirmatory factor analyses reported consistent results that Elliott (1997b) found were essentially in agreement with the DAS data analyses given in the test handbook (Elliott, 1990c). Elliott (1997b) also reports joint factor analysis of the DAS and the WISC-R (Wechsler, 1974) that support a verbal/nonverbal/spatial-factor model for school-age children.

Elliott (1990c) also provides evidence supporting the convergent and discriminant validity of the DAS cluster scores. The Verbal Ability cluster score consistently correlates much higher with the verbal composite score than it does with the nonverbal composite score of other cognitive ability tests for children. Similarly, the Nonverbal Reasoning and Spatial Ability cluster scores correlate significantly lower with the verbal composite score of other tests than does the DAS Verbal Ability cluster score.

Evidence for the concurrent validity of the DAS is provided by studies (Wechsler, 1991; Elliott, 1990c) showing consistently high correlations between the GCA and the composite scores of other cognitive batteries such as the Wechsler Intelligence Scale for Children—Third Edition (WISC-III; Wechsler, 1991), the Wechsler Preschool and Primary Scale of Intelligence—Revised (WPPSI-R; Wechsler, 1989), and the Stanford-Binet Intelligence Scale-Fourth Edition (Thorndike et al., 1986). High correlations were also found between the DAS achievement tests and other group or individual achievement tests as well as with actual student grades (Elliott, 1990c).

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780120585700500057

Validity

Carina Coulacoglou, Donald H. Saklofske, in Psychometrics and Psychological Assessment, 2017

Construct validity and Messick’s unified approach

The construct validity of score interpretation comes to sustain all score-based inferences. The essence of unified validity is that the appropriateness, meaningfulness, and usefulness of scored-based inferences are integrated and their integration is the outcome of empirically based score interpretation. To refer to validity as a unified concept does not imply that validity cannot be usefully differentiated into distinct aspects, such as the social consequences of performance assessments or the role of score meaning. The target of these distinctions is to provide a means of addressing functional aspects of validity that helps to clarify some complexities in evaluating the appropriateness, meaningfulness, and usefulness of score influences. A key feature for the content aspect of construct validity is to determine the boundaries and structure of the construct domains, such as the knowledge, skills, attitudes, motives, and other attributes to be revealed by the assessment tasks. The boundaries and structure of the construct domain can be examined through job analysis, task analysis, curriculum analysis, and especially domain theory. The goal of the test developed is to ensure that important aspects of the construct domain are covered. This procedure is described as selecting items/tasks in terms of their functional importance. Both content relevance and representativeness of test items are commonly evaluated by expert professional judgment.

The substantive aspect of construct validity highlights the role of theories and process modeling in identifying the domain processes expressed through test items (Embertson, 1987; Messick, 1989). The issue of domain coverage refers not just to the content representativeness of the construct measure, but also to the process representation of the construct and the degree to which these processes are reflected in the measurement. Such evidence can be derived through various sources, such as “thinking aloud” protocols and eye-movement records, during responding or through computer modeling of task processes.

According to the structural aspect of construct validity, the theory underlying the construct domain should direct the development of construct-based scoring criteria, in addition to the selection or construction of appropriate assessment tasks. Thus the internal structure of the assessment (i.e., the interrelations among the scored aspects of task and task performance) should correspond with one’s knowledge about the internal structure of the construct domain (Messick, 1989).

Evidence of generalizability relies on the degree of correlation of the assessed tasks with other tasks representing the construct or aspects of the construct. The issue of generalizability of score inferences across tasks or contexts links to the score meaningfulness. The conflict between depth and breadth of domain coverage often reveals a “conflict” between validity and reliability (or generalizability). In addition to generalizability across tasks, the limits of score meaning are also affected by the degree of generalizability across time or occasions and across observers or raters of the task performance.

The external aspect of validity refers to the extent to which the assessment scores correlations with other measures and behavioral manifestations reflect the expected interactive relations implicit in the theory of the target construct. Thus the constructs represented in the assessment should account for the external pattern of correlations. Notably, among these external relationships are those between the assessment scores and criterion measures related to selection, placement, program evaluation, or other purposes related to applied contexts. The consequential aspect of construct validity includes evidence for evaluating the intended and unintended consequences of score interpretation and use. Consequences can be associated with bias in scoring and interpretation or with unfairness in test employment. The major concern regarding negative consequences is that any negative impact on individuals or groups should derive from test invalidity, such as construct underrepresentation or construct irrelevant variance (Messick, 1989).

A fundamental of construct validity is construct representation. In construct validity through the use of cognitive-process analysis or research on personality and motivation, a person attempts to identify the mechanisms underlying task performance. According to Messick (1995) there are two major threats to construct validity: construct underrepresentation (CU) and construct irrelevant invariance (CII). In construct underrepresentation the assessment is too limited and fails to include important dimensions or facets of the construct. In contrast, in the CII the assessment is too broad, containing excessive reliable variables associated with biased responses that may affect an objective construct interpretation.

Messick (1995) posits that construct validity of score interpretations appears to underlie all score-based inferences—not only the ones related to interpretive meaningfulness but also the content and criterion-related inferences specific to decisions and actions based on test scores. Messick (1995) proposes the concept of unified validity that “integrates considerations of content, criteria and consequences into a construct framework for the empirical testing of rational hypothesis about score meaning and theoretically relevant relationships, including those of an applied and a scientific nature” (p. 741).

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128022191000031

Organization of Social Validity Data

Stacy L. Carter, in The Social Validity Manual, 2010

Generalizability and the Boundaries of Score Meaning

The construct validity principle regarding generalizability and boundaries of score meaning, described by Messick (1995), centers on the reliability of data and ability to make inferences about the data across tasks, time, observers, and raters. For social validity data, reliability data have been reported for several treatment acceptability instruments, but many other instruments do not address this issue. The boundaries of how social validity data can be used to make inferences may depend upon how well the measurement addresses the specific area of concern. For example, measures reflecting a high degree of consumer-related social validity may not imply similarly high social validity among other members of society. At present, it appears that some inferences can be made, such as that treatment procedures which are predominantly composed of reinforcement techniques are generally more acceptable than those comprised of punishment techniques. While this factor may be generalized, other types of generalization do not provide consistent evidence, and additional research is needed to clarify these inconsistencies.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012374897300009X

Measures of Sensation Seeking

Marvin Zuckerman, Anton Aluja, in Measures of Personality and Social Psychological Constructs, 2015

Construct/Factor Analytic

The construct validity of the ZKA-PQ was analyzed by means of exploratory factor analysis based on the intercorrelations of the ZKQ-PQ and TCI-R facets. A 5-factor solution was obtained, the second factor of which includes all four of the Sensation Seeking scales from the ZKA-PQ, two of the Aggression facet scales (AG1, physical aggression, AG2 verbal aggression), and two Novelty Seeking (NS 2 and 4) facets from the TCI-R. It also includes ZKA-PQ exhibitionism (EX3) and negative loadings from TCI-R self-acceptance (SD4) and fear of uncertainty (HA2) and a positive loading from impulsiveness (NS2). Although this is primarily a sensation seeking factor it is mixed with other kinds of facets (Aluja, Blanch, García, García, & Escorial, 2012).

The five factors of the original ZKPQ were well replicated, and the factor structure (principal axis plus varimax rotation) was shown to be highly congruent in the three samples despite cultural and age differences between the samples. Factor intercorrelations indicate a relative orthogonality among the five factors, with two exceptions. Specifically, significant correlations were found between Aggressiveness and Sensation Seeking, and between Neuroticism and Extraversion (Aluja et al., 2010).

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123869159000139

Choice Models

Paul F.M. Krabbe, in The Measurement of Health and Health Status, 2017

Construct

Tests of construct validity seem almost impossible in the context of health measurements based on choice models. The reason is that the health objects presented to the respondents for judgment may consist of unspecified (holistic) features. For example, the respondents may be asked to compare different outcomes after plastic surgery or photos of skin affected by psoriasis. If the object they have to judge consists of health descriptions that are built on a small and fixed set of health attributes with various levels, an analysis to compare these attributes with attributes of scales from other instruments is also impossible. These problems are related to the fact that we are dealing here with formative measurement. Other types of construct validity, such as known-groups validity, may be feasible for the estimated values derived with choice models.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128015049000118

Preclinical Animal Studies

Marsida Kallupi, Roberto Ciccocioppo, in Biological Research on Addiction, 2013

Animal Models: Construct Validity

To have construct validity, an animal model of cocaine abuse should rely on similar neurochemical, neurobiological, and physiopathological mechanisms and should be sensitive to the same events thought to be important in eliciting the human disorder. Years of clinical and experimental research have demonstrated that cocaine addiction is a multifactorial disorder in which genetic predisposition plays an important role. For example, twin studies have revealed that lifetime cocaine abuse and dependence are largely influenced by genetic risk factors. Consistent with the role of genetic mechanisms in cocaine addiction, several linkage analysis investigations showed a correlation between the propensity toward cocaine abuse and specific gene polymorphisms at the level of various neurotransmitter systems. This view was confirmed in laboratory animals, in which it was shown that deletion of one specific gene leads to an increase or decrease in cocaine-using behavior and/or cocaine-seeking vulnerability. For example, knocking out genes that regulate receptor or neurotransmitter functions linked to DA, glutamate, serotonin, and endocannabinoidergic has been shown to dramatically change the sensitivity to cocaine and/or the motivation to its consumption. Clinical research has also demonstrated that genetic vulnerability traits may be common to several abused drugs. In line with this clinical observation, it was demonstrated that rats genetically selected for excessive alcohol consumption showed increased motivation to self-administer cocaine. In fact, at least two rat lines genetically selected for excessive alcohol drinking, namely P and AA rats, have been shown to be more sensitive to the psychotropic effects of cocaine and to have an innate higher predisposition to its consumption.

One of the current laboratory experimental paradigms, which may play a key role in addiction-like behaviors, includes the utilization of vector-based delivery systems to modify gene expression in the brains of rodents in order to identify novel signaling cascades. This recently developed optical neuroengineering technology includes the fundamental conceptualization of the specific brain circuit modulation and interventions to unravel the treatment of drug addiction and the complicated neurotransmitter network.

Additional evidence supporting the construct validity of animal models comes from studies of two inbred rat lines, the Lewis (LEW) and its histocompatible control, the Fischer 344 strain (F344). When trained on an extended schedule of cocaine self-administration, LEW rats tend to escalate drug intake, whereas F344 rats do not. Moreover, LEW rats, like human addicts, have a lower density of D2 receptors in striatal areas, a higher increase of DA in the nucleus accumbens (NAc) in response to drug challenge, and a higher tendency to suffer from a dysregulation of the HPA axis. On this basis, LEW rats may be considered an addiction-prone genotype and F344 rats an addiction-resistant one.

Another important aspect in addiction is individual vulnerability. In fact, it is well known that a relatively small percentage of humans who have innate predisposition to cocaine abuse eventually become addicted or dependent on it. These individual differences in the likelihood of developing cocaine addiction may reflect the fact that cocaine dependence is a multifactor disorder, in which genetic predisposition is an important determinant, but drug exposure and environmental factors may then play a critical role in shaping individual vulnerability to disease progression. In this respect, it is significant that individual differences in developing cocaine abuse and dependence have also been described in laboratory animals. For example, it has been demonstrated that heterogeneous rats selected for low and high impulsivity also differ in the vulnerability to develop cocaine abuse, with only the latter showing compulsive drug-taking despite aversive consequences. Epidemiological studies have also revealed a clear association between the sensation/novelty-seeking trait and cocaine abuse. Paralleling the human condition, rats characterized by higher levels of locomotor activity and exploratory behaviors in a novel environment (considered to be measures of sensation/novelty seeking) show increased cocaine self-administration and drug-related compulsive traits, respectively. Finally, it has been demonstrated that if rats are trained to self-administer cocaine for a very prolonged period of time, a relatively small portion will develop the typical behaviors associated with cocaine dependence, mimicking the major clinical symptoms for cocaine addiction reported in DSM-IV.

In cocaine addiction, there also exists a complex relationship between drug use, HPA axis activation, and endocrine effects, which has also been documented. For instance, acute cocaine administration increases plasma levels of Adrenocorticotropic hormone (ACTH) and glucocorticoids in humans. Cocaine administration in chronic cocaine users may also stimulate the HPA axis response, but the effect is less pronounced when compared to that observed in nonusers. This finding indicates that a history of chronic cocaine use leads to a hypofunction of HPA activity and to an altered reactivity to stress. Consistent with these clinical findings, rodents and nonhuman primate studies have demonstrated that cocaine injection in drug-naive animals leads to a pronounced HPA axis activation. A similar effect is also observed in rats that escalated cocaine use following a protracted intravenous cocaine self-administration training. However, the effect is substantially less pronounced compared to naive animals. These findings demonstrate that chronic exposure to cocaine leads to similar adaptive changes in the hormonal stress system in humans and laboratory animals. At present, it is not clear if HPA axis hypoactivity plays a causal role in the progression of cocaine abuse trajectory or whether it is a mere consequence of chronic drug exposure. On the other hand, these findings support the notion that the adaptive mechanisms occurring following protracted cocaine use leading to hypofunction of HPA axis activity are linked to the progression of cocaine addiction not only in humans but also in laboratory animals. Clearly, this is a striking evidence supporting the construct value of animal models of cocaine addiction.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123983350000133

Assessment

Hedwig Teglasi, in Comprehensive Clinical Psychology, 1998

4.16.13.2 Construct Validation

Messick (1989) defines construct validity as “an integration of any evidence that bears on the interpretation or meaning of test scores” (p. 17). Because traditional indices of content or criterion validity contribute to the meaning of test scores, they too pertain to construct validity. Thus, construct validity subsumes all other forms of validity evidence.

This emphasis on construct validity as the overriding focus in test validation represents a shift from prediction to explanation as the fundamental focus of validation efforts. Construct validation emphasizes the development of models explaining processes underlying performance on various tests and their relationships to other phenomena. Accordingly, correlations between test scores and criterion measures contribute to the construct validity of both predictor and criterion. In Messick's words, “Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment” (p. 13).

What is meant by construct validity?

Construct validity is about how well a test measures the concept it was designed to evaluate. It's crucial to establishing the overall validity of a method.

What is construct validity quizlet?

Construct validity is an assessment of how well you translated your ideas or theories into actual programs or measures.

Which of the following best describes construct validity?

Which of the following statements best describes construct validity? It refers to the adequacy of the operational definition of variables.

How is construct validity determined?

Construct validity is usually verified by comparing the test to other tests that measure similar qualities to see how highly correlated the two measures are.