Validated Methodological Operationism: Improve Analytics by Validating Your Operations

Many measured processes can be improved by validating your process operations.  This is true whether your are talking about business, experimental, or educational processes.

A New View Of Operationism

Interesting read on operationism by Uljana Feest – (2005)  Operationalism in Psychology: What the Debate is About, What the Debate Should Be About [Journal of the History of Behavioral Sciences, 41(2) 131-149].

The basic gist: Psychologist’s historical use of operationalism was methodological rather the positivist (even though they may have referenced positivism for philosophical cover).  So criticizing operationism using positivist arguments is somewhat misguided, but operations can be criticized through validation arguments.

What does Feest mean by a methodological reading of operationism?

. . . I mean that psychologists did not intend to say, generally, what constitutes the meaning of a scientific term.  . . . in offering operational definitions, scientists were partially and temporarily specifying their usage of certain concepts by saying what kind of empirical indicators they took to be indicative of the referents of the concepts (p. 133).

She concludes by saying:

. . . the debate should then be about what are adequate concepts and how (not whether) to operationalize them, and how (not whether) to validate them (p.146).

So any debate about operationism is really about constructs and their validation.  Within this framework, I will list 4 specific types of operationism.

Positivists, Empirist Operationism

This idea can be represented by Percy Bridgman’s original conception of operationsim

in general, we mean by a concept nothing more than a set of operations; the concept is synomonous with the corresponding set of operations (Bridgeman, P.[1927]. The logic of Modern Physics, Macmillan:NY. p.5).

The biggest problem with this approach is that any set of operations can never be said to exhaust the entirety of meaning in any construct, a position that is also supported by cognitive psychology’s understanding of cognitive processes in the meaning and use of concepts (Andersen, H., Barker, P & Chen, X. (2006). The Cognitive Structure of Scientific Revolutions, Cambridge University Press).

Methodological Operationism

The idea that operations are the empirical indicators of the construct (Feest).

Naive Pragmatic Operationism

Regardless with how you conceive of a construct, within any measured process, no matter if that process is an experimental, business or any other process that is controlled by measures, those measurement operations are methodologically defining that construct in the function of that process.  If you throw any measure in place without determining how and why you are using that measure, you are operating in the same fashion as any operationists in the positivist empiricist mode and you are subject to the same kinds of problems.  Garbage in = garbage out; this is the real potential problem with this approach.  There are many business process that do not meet their expectations and those problems can be traced back to poor quality measurements whose construct are not appropriately operationalized.

Validated Methodological Operationism

This represents measured processes whose operations are clear and whose quality and validity has been adequately evaluated.


Feest references the gap between qualitative and quantitative research as being about operationism.  I believe this is incorrect.  Operationism is about construct validity (unified theory).  Criticism of qualitative research is usually about research validity (a different validity) and the value of different research purposes.

Avoiding Naive Operationalism: More on Lee Cronbach and Improving Analytics


Consider again Cronbach and Meehl’s (1955) quote from my last post.
We do believe that it is imperative that psychologists make a place for (construct validity) in their methodological thinking, so that its rationale, its scientific legitimacy, and its dangers may become explicit and familiar. This would be preferable to the widespread current tendency to engage in what actually amounts to construct validation research and use of constructs in practical testing, while talking an “operational” methodology which, if adopted, would force research into a mold it does not fit.  (Emphasis added)
What was widespread in 1955 has not substantially changed today.  Construct measures are routinely developed without regards to their construct or consequential validity, and it is in detriment to our practices.  I will name this state, naive operationalism; measuring constructs with what amounts to an operational methodology.  I will also show why it is a problem.

Operational Methodology: Its Origins as a Philosophical Concept

What do Cronbach & Meehl mean by an operational methodology?  Early in my psychological studies I heard the definition of intelligence stated as “that which is measured by an intelligence test”.  It was an example of operationalism (or operationism). Originally conceived by a physicist named Percy Bridgman, operationalism conceptually states that the meaning of a term is wholly defined by its method of measurement.  It became popular as a way to replace metaphysical terms (eg. desire or anger) with a radical empirical definition.  It was briefly adopted by the logical positivist school of philosophy because of its similarity to the verification theory of meaning. It also became popular for a longer time period in psychology and the social sciences.  Neither use stood up to scrutiny as noted in Mark Bickhard’s paper.
Positivism failed, and it lies behind many of the reasons that operationalism is so pernicious: the radical empiricism of operationalism makes it difficult to understand how science does, in fact, involve theoretical and metaphysical assumptions, and must involve them, and thereby makes it difficult to think about and to critique those assumptions.
Not only does the creation of any measurement contains many underlying assumptions, the meaning of any measurement is also a by-product of the uses to which the measurement is put.  The heart of validity theory in the work of Cronbach (and also in Samuel Messick), is in analyzing various measurement assumptions and measurement uses through the concepts of construct and consequential validity.  Modern validity theory stands opposed to operationalism.

Operational Definition as a Pragmatic Psychometric Concept

Specifying an operational definition of a measure is operationalism backwards.  Our measurements operationalizes how we are defining a term, not in the abstract, but in actual practice.  When we implement a measurement in practice, that measurement effectively becomes the construct definition in any processes that involves that measure.  If the process contains multiple measures, it is only a partial definition.  If it is the sole measure, it also becomes the sole construction definition.  Any measure serves as an operational definition of the measured construct in practice, but we don’t believe (as in operationalism) that the measures will subsume the full meaning of any construct.  Our operational definition is no more than a partial definition and that is why consequential and construct validity are needed in our methodological thinking.  Validity research tell us when our operational definitions are problematic and may give us indication as to how to make improvements to our measures.  Validity research studies the difference between our operational definitions and the construct being measured.

Naive Operationalism

For most of us, operationalization outside the larger issue of a research question and conceptual framework is just not very interesting.
I could not disagree more! Not including validity in our methodological thinking will mean that our operationalized processes will result in what I will call naive operationalism.  If we devise and implement measures in practice, without regard for their validity, we will also fail to understand any underlying assumptions and will be unable to address any validity problems.  In effect, it is just like philosophical operationalism and sets us up for the same problems. Lets consider a concrete example to see how it can become a problem.

An Example of Naive Operationalism

Richard Nantel and Andy Porter both suggests that we do away with Performance Measurement, which is considered “a Complete Waste of Time”.  These are the reasons given for scrapping performance measurement:
  1. Short term or semiannual performance  reviews preventing big picture thinking, long-term risk taking and innovation. We want employees to fail early and often.
  2. Performance systems encourage less frequent feedback and interferes with real-time learning.
  3. Compensation and reward systems are based on faulty  incentive premises and undermining intrinsic motivation.
  4. There’s no evidence that performance rating systems improve performance.
Consider each reason in turn
  1. This critique is advocating for a different set of constructs.  True, the constructs they imply may not be common to most performance measurement systems, but there is no reason to stay with standard constructs if they are not a good fit.
  2. There is no reason why formative assessments like action analytics and other more appropriate feedback structures could be a part of any performance improvement systems.
  3. This is another instance where it appears that the wrong constructs, based on out of date motivational theories, are being measured.  They are the wrong constructs and therefore the wrong measures.
  4. The consequences of any measurement systems is the most important question to ask.  Anyone who doesn’t ask this questions should not be managing measurement processes.


What is the bottom line?  There is nothing Richard or Andy point out  that would make the concept of performance measurement wrong.  The measurement systems they describe are guilty of naive operationalism.  The idea that any specific measure of performance is the sole operational definition needed and this is true even they are unaware of what they are doing.  No!  We should assess the validity of any measurement system and adjust according to an integrated view of validity within an appropriate theoretical and propositional network as advocated by Cronbach and Meehl.  Measurement systems of any kind should be based on construct and consequential validity, not an operational methodology, whether it is philosophical or naive.

#LAK11 – Validity is the Only Guardian Angel of Measurement (Geekish)

David Jones has posted about the general lament of high-stakes testing and asks; “what’s the alternative”?  You could rephrase this to ask, does measurement help us or hurt us?  Not only has he piqued my interest to think more along these lines, but I think the question is also relevant to data analytics, LAK11, and anyplace where measurement is used.  So. . .dive in I will!

David sites the association of the testing movement with globalization and managerialization, but I also believe that analytics, appropriately applied, can benefit education in pragmatic everyday ways.  He also quotes Goodhart’s Law, a British Economist, who spoke on the corruptibility of policy measurement.  I prefer Donald Campbell’s similar law even better for this situation because he was a psychometrician and speaks in testing language.  He states:

The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.

I believe that the problem being discussed, at its heart, is a validity problem.  High stakes testing is not appropriately bearing the high burden assigned to it.  It is not meeting the criteria of consequential validity; it does not produce better long-term educational outcomes.  Cronbach and Meehl (1955) explained why validity is important for situations like this one.

We do believe that it is imperative that psychologists make a place for (construct validity) in their methodological thinking, so that its rationale, its scientific legitimacy, and its dangers may become explicit and familiar. This would be preferable to the widespread current tendency to engage in what actually amounts to construct validation research and use of constructs in practical testing, while talking an “operational” methodology which, if adopted, would force research into a mold it does not fit.

This is what I believe they are saying (also taking into account Cronbach and Messick’s later developments in the concept of validity).  We measure constructs, not thing in themselves.  A construct is defined by the network of associations or proposition within which it occurs  (See Cronbach and Meehl, Recapitulation Section #1).  Validity is the rational and empirical investigation of the association between our operational definitions of our constructs (that is our tests) and our network of associations.  Without this investigation, what we are measuring is operationally defined by our tests, but what that is remains undefined in any meaningful way.  We can’t teach constructs or measure them unless we thoroughly understand them at least at a theoretical level.  Operizationalism has been rejected, whether it was founded in positivist philosophy or in common sense ignorance.

Most people who I’ve heard advocating for standardized and standards based high stakes testing, do so based on a principle of school accountability, not because such testing has been unequivocally demonstrated as a way to improve schools.  It’s a logical argument and it seems to be lacking empirical support.  Teaching to the test is regarded as inappropriate, but if the test is the sole standard of accountability, than it is an operational definition of what we are to be teaching.  In that case, anything other than teaching to the test seems illogical.

So let’s dig deeper.  I think there are nested psychometric problems within this testing movement.  Campbell’s Law may overtake any attempt to use measurement in policy in a large sense, but I am going to start with how a measurement regime might be designed better.

1. What measurement problems exist with current tests?

Teaching to the test as it is commonly practiced is not good because it is doubtful that tests are really measuring the correct information.  There are many unintended things being measured in these high stakes standardized tests (technically referred to in validity theory as irrelevant variance). In many ways, our measures are based (operationalized) more on tradition and common sense as opposed to empirically sound psychometry.  This is what Cronbach warned of when our tests don’t match the constructs.  To improve tests we need to go beyond common sense and clarify the constructs we desire our students to exhibit.  Why don’t we do this now.  Most likely it is too difficult for policymakers to get their head around, but there is a possible second reason.  It would reduce the validity of tests as their validity is measured by positivist methodology.  Validity is an overall judgement but positivist don’t like fuzzy things like judgements.  Tests may need to be reduce in validity in some areas, in order to gain validity overall.  Many people guiding testing procedures likely have a narrow view of validity as opposed to a more broad view of validity as espoused by Messick or Cronbach.  This lead to other issues.

2. Standards do not Address Many Important Educational Outcomes.

The curriculum, as it is reflected in standards, is not always focused on the most important knowledge and skills.  I think it reflects three things.  A kitchen sink approach (include the request of every constituency), a focus on standards that are easily measured by multiple choice or similar types of questions, and expert opinion.  The ability to creatively argue points of view, write with persuasion and conviction, to read, interpret, discuss and develop subtle points of meaning among peers, and to track the progression and maturation of these types of skills over time are important things that are not well measured by current high stakes tests.  A kitchen sink approach does not allow teachers to focus on depth.  Assessments like portfolios contain more information and a broader validity base, but are seen as less reliably (i.e. it’s possible to cheat or include personal bias).  Expert opinion is a type of content validity and is considered the weakest form of validity evidence.  With the development of high stakes testing, we are in more of a position to measure the validity of curriculum standards and to adjust standards accordingly, but I see no one doing this.  Maybe there is some research on the ability of high school students to function as college freshman, but this outcome is inconsequential in a long view of one’s life.  Tests should be held accountable for consequential validity and to empirically show that they result in improved lives not just parroting facts or helping teachers of college freshman.  It is not just teachers that should be held accountable, it is also test and standard developers.

3. Post-positivist Psychometrics

To be sure, there are trade-off in any form of measurement.  Sometimes improving validity in one area weakens validity in other areas.  Validity never reaches 100% in any situation.  However, because tests are mandated by law, I believe current validity questions favor views of what will be held valid in a court of law.  Law tends to be conservative and conservative psychometric are based in philosophical positivism.  I bet that many people making policy decisions have a poor understanding of what I consider to be sound psychometrics, psychometrics that are consistent with post-positivist philosophy.  Let me be clear, positivist psychometrics are not wrong, just incomplete and limited.  This was the insight of Wittgenstein.  Positivism looks at a small slice of life, while ignoring the rest of the pie.  Wittgenstein said if we want to understand language, look at how people are using language.  Similarly, Samuel Messick said, if you want to understand a test, follow the outcomes.  How are people using the test and what are the results of what they are doing.  This is the most important test of validity.

To sum up

There are many possible things that could be done in answer to David’s question.  I have focused on how you might improve testing processes. Do not focus on tradition and traditional  technique, but on standards and testing practices that creating authentic  value (what Umair Haque would call thick value) for students who will live out their lives in the 21st century, a century that is shaping up to be quite different from the last.  Testing could be part of the equation, but lets hold teachers and schools accountable for the value they create as it is measured in improved lives, not in some questionably valid test score.

#LAK11 – Utopian and Dystopian Visions of Analytics: It’s a Question of Validity

Catching up on the beginning of LAK11 which began last week.

George Siemens’ 1-16 post has initiated a discussion on critiques, much of which seems to focus on dystopian critique.

David Jones’ earlier critique is a good example.  His interesting critique is based on his fear of teleological implementation:

This remains my major reservation about all these types of innovations. In the end, they will be applied to institutional contexts through teleological processes. i.e. the change will be done to the institution and its members to achieve some set plan. Implementation will have little contextual sensitivity and thus will have limited quality adoption. . ..

This is what I consider to be a basic modernist approach with only quantitative teleology, that is, final causes can be judged solely through numbers resulting from simple quantitative analyses.

I studied Samuel Messick for my dissertation and my reading of him was that he was a psychometrician who took seriously the postmodern critique of the 20th Century philosophers of sciences.  His response was that the question of validity could never be answered without both quantitative and qualitative analysis.  Messick’s approach has always been seen negatively by those who need the teleological certainty of positivist quantitative only answers.  This is exactly the simplistic way David fears analysis will be used and his fear is valid.  Not because these tools can not achieve good things, they could improve our lives tremendously.  However, understanding in depth their use and the consequences of their use is a difficult undertaking requiring quantitative and qualitative analysis in it’s own right.  Many people will not be willing to put in that kind of effort.  A utopian leaning vision can only be achieved with hard work and much effort, but a dystopian vision can be achieve with only minimal effort.

One Description of Science and the Basis for an Argumentative Approach to Validity Issues

I came across an interesting metaphor for science (and structural ways of understanding in general) in the Partially Examined Podcast Episode #8.   Here is my take on the metaphor.

Imagine the world as a white canvas with black spots on it.  Over that, lay a mesh made of squares and describe what shows through the mesh.  We are describing the world, but as it shows through the mesh.  Change the mesh in size or in shape and we have a new description of the world.

Now, these descriptions are useful and allow us to do things, but they are not truth, they are description.  They may be highly accurate in their descriptions of an actual world, but they are still descriptions.  It’s how science functions and is how science progresses and changes.  It also is why I advocate an argumentative approach to validity in the use of scientific structures like assessment or the use of evidence.  Old forms of validity (dependent on criterion validity) and much of the current discussion of evidence-based approaches is about the accuracy in certain forms of description.  But we must also allow for discussions of the mesh (to return to the metaphor).  As in construct validity, any discussion of how the world is must also include a discussion of how the mesh interact with the world to create the description.

In addition to methods like random controlled trials (RCTs), there is also a need for research into how we understand and rethink the assumptions and things that are sometimes unexamined in research.  RCTs are very good at helping us do things with very accurate descriptions (like describe linear causal processes).  We also need research that uses other meshes that will allow us to understand in new ways and facilitating our ability to do new and different things; to make progress.

Mathematics in The Real World: Are Your Use(s) of Numbers Valid

It is my premise that most people do not really understand how to use mathematics strategically in a concrete world.  They don’t think much about what the numbers mean and meaning is everything if you want to know what the numbers are doing.  At its heart, math is an abstraction; an idea that is not connected to real world circumstances.  (See Steven Strogatz’s NY Times article for a detailed look at math and its misuse in education pedagogy)

The trick to understanding and using math in the real world can often be traced to how we devise the measurements that define the meaning of numbers that are then to be treated mathematically.  Let look at some problems relation to the use of numbers and how their meaning is misunderstood.

Problem #1 Educational Testing – Measurement should aways be designed to serve a goal; goals should never be design to fit a measurement protocol.  This is why proficiency testing will never help education and the core idea behind a recent New York Times editorial by Susan Engel.  Current public school measures do not reflect the capabilities we need to develop in students.  It’s not bad that people teach to the test, what’s bad is that the test itself it not worth teaching too.

Our current educational approach — and the testing that is driving it — is completely at odds with what scientists understand about how children develop . . . and has led to a curriculum that is strangling children and teachers alike.

(Curriculum should reflect) a basic precept of modern developmental science: developmental precursors don’t always resemble the skill to which they are leading. For example, saying the alphabet does not particularly help children learn to read. But having extended and complex conversations during toddlerhood does. (What is needed is) to develop ways of thinking and behaving that will lead to valuable knowledge and skills later on.

The problem we see in current testing regimes is that we’re choosing to test for things like alphabet recall for two reasons.

  1. We base measures on common sense linear thinking like the idea that you must recognize letters, before recognizing words, before using words to build statements.  But if fact (as Ms Engel’s article points out) the psychological processes of building complex conversations is the developmental need for students and that is rather unrelated to how thought is considered in schools and how curriculum is developed.  Developmental needs should be studied for scientific validity and not left to common sense.
  2. The current measurement protocols behind proficiency testing  is not very good at measuring things like the ability to participate in complex conversations, it simply doesn’t translate well to a multiple choice question.  We could develop rubrics to do that, but it would be hard to prove that the rubrics were being interpreted consistently.  So instead we test abilities that fit the testing protocol, even if they are rather irrelevant (read invalid) to the capabilities that we really desire to foster.

Problem #2 Business Analytics – Things like analytics and scientific evidence are used in ways that relate mostly to processes and activities that can be standardized.  These are ways of doing things where there is clearly a best way to do it that can be scientifically validated and is repeatable.  The problem occurs when we try to achieve this level of certainty in everything, even if there is little that science can say about the matter.  Math is not about certainty, it’s about numbers.

The problem, says (Roger) Martin, author of a new book, The Design of Business: Why Design Thinking is the Next Competitive Advantage, is that corporations have pushed analytical thinking so far that it’s unproductive. “No idea in the world has been proved in advance with inductive or deductive reasoning,” he says.

The answer? Bring in the folks whose job it is to imagine the future, and who are experts in intuitive thinking. That’s where design thinking comes in, he says.

The problem with things like six sigma and business analytics is that you need to understand what it’s doing mathematically and not just follow a process.  If you’er just applying it, and you don’t understanding what it’s doing, you’ll try to do things that make no sense.  It not usually a problem with the mathematical procedures, it’s a problem with what the numbers mean.  How the numbers are derived and what’s being done as a result of calculations.  There is nothing worse than following a procedure without understanding what that procedure is doing or accomplishing.  Martin’s basic thought that innovation and proof are incompatible is false.  The real problem is a lack of understanding in how mathematics and proof can be use in concrete situations.

Problem #3, Use of the bell curve in annual reviews and performance management.

A recent McKinsey article, (Why You’re Doing Performance Reviews All Wrong, by Kirsten Korosec) generated a lot of negative comments by people force to make their review correspond to a bell curve.  In statistic we know that if you take a large enough random sample of anything that can be represented by numbers, the resulting distribution of the represented quality will resemble a bell curve, large in the middle and tapering off at either end.  But performance management is about fighting the bell curve; it’s about improving performance and moving the bell curve.  If you have to fit your reviews to a bell curve, your making performance look random.  That’s exactly what you do not want to do.  Once again we see a management practice that uses mathematics without understanding what they are doing.

What’s needed?  The valid use of mathematics not the random use

The basic problem is that mathematics is abstract, but human activity is concrete.  If we want to bridge these two worlds (and, as Strogatz explains it, they really seem like parallel universes) we must build a bridge of understanding that is called validity.  Validity is really the scientific study of how the concrete is made abstract and how the abstract is made concrete.  It’s an explicit theory of how the scope of activities can be represented by numbers, laid out so that it can be argued and understood.  You can do amazing things with mathematics in the real world, but only if you understand what you are doing, if you understand how the abstract and the concrete are related.  You must understand how numbers can represent and are related to the world of human activity.

Making Inferences about the Use of Artifacts in Practices

This post is to think through what I brought up last post, applying the concept of validity to practice.

I remember hearing in school validity asked the question: “does the test measure what it’s intended to measure”?  The problem with this type of approach is that it leads you in circles, both practically and epistemologically.  Messick changed this to a question that was quite literally more consequential.  Is there evidence that the use of the test brings or contributes to the results you intended?

If you view a test or assessment as an artifact imbedded in a practice, you could apply the same type of logic to any artifact that play an active role in that practice.  In artifact creation like in Holmstrom’s article, the logic of validity could be applied as a guide.  Although there would still be an artistic element it would not be random or unsupported.

There is an evidentiary aspect to this way of considering artifacts.  Validity is all about evidence. Theoretical evidence, process evidence, empirical evidence, consequential evidence, generalizability evidence; this is all about validity.  In fact, validity theory can be a way of accounting for evidence from all type of methodology.  Random Controlled Trails are still the best way of judging validity for certain types of research questions, but any type of method can contribute to evidence.

Summary: Validity is a well-developed body of though that can be applied to making inferential judgements about evidence supporting the use of assessment or other active artifacts in the context of a specific practice.

Problem-Based Artifact Creation Process

This post extends my thinking about the last post. The concept of validity is as applicable to practices as it is to tests. When I develop an assessments and protocol, I think about validity from the get go, not just after the fact. Similarly, when designing a process or practice, it is good to add a validity perspective. That is what I’ve done with Holmstrom’s problem-based artifact creation process; add where content, theory, and data should be considered. Here is a prototype process map:Design Sci Map

Combining Evidence and Craft for Successful Practice: No False Dichotomies

Evidence-based (in all its various permutations) is a construct that needs to be carefully worked out.  If evidence-based practice was self-evident, we would have achieved it through the success and extension of operationalism*, but, that wasn’t to be the case.  Participating in a practice requires evidence, craft and experience combined in a way that is fraught with complexity; but improving practices of all kinds is dependent on meeting this complexity.

I recently came across two ideas from Wampold, Goodheart & Levant (Am Psychol. 2007 Sep;62(6):616-8) that go a long way to clarifying this evidence-based construct.  Their first clarification is a definition of evidence and the second is to counter the false dichotomy of evidence vs. experience.

Evidence and Inference

Evidence is not data, nor is it truth.  Evidence can be thought of as inferences that flow from data.  . . . Data becomes evidence when they are considered with regard to the the phenomena being studied, the model used to generate the data, previous knowledge, theory, the methodologies employed, and the human actors.

This is not a simple positivist conception of evidence, but reflects a complex multimodal aggregation.  In addition, I would add that the primary concern of practitioners is really the validity of the practices they are conducting.  The validity of practice is supported by evidence, but it is dependent on the use of practice in context.  We do not validate practice descriptions or practice methodologies, but rather the use of practice in its local contexts, understood by reference to phenomena, models, knowledge, theories, ex-cetera.  I’ll have to look back at validity theory to see if I can get a clearer description of this idea.

Evidence and Experience

A second insight expressed by Wampold, Goodheart & Levant is the integrative nature of evidence and experience as they relate to practice; where any opposition between experience and evidence is considered to be a false dichotomy.  The ability to use evidence is a component of practice expertise including the ability to collect and draw inferences from local data through the lens of theory and empirical evidence or in the ability to adjust practices in response to new evidence.  It’s experience and evidence and evidence as a part of experience.

Evidence and Craft

I find it somewhat serendipitous that I have been drawn into conversations involving design management and evidence-based management.  It is because I believe that the success of each one depends on the other.  The positivist agenda of running the world by science is not tenable.  The world is too complex and there are too many relevant or even distant variables for a positivist program to be sustainable.  Science cannot do it all, but neither can we be successful without science, evidence and data.  We need a bit of craft and a bit of evidence to engage in practice.  That may often include craft in the way that evidence is used and it may entail craft that is beyond evidence. It just should not draw false diochotomies between evidence and craft.

The goal of operationalism “was to eliminate the subjective mentalistic concepts . . .  and to replace them with a more operationally meaningful account of human behavior” (Green 2001, 49). “(T)he initial, quite radical operationalist ideas eventually came to serve as little more than a “reassurance fetish” (Koch 1992, 275) for mainstream methodological practice.” Wikipedia (Incomplete references noted in this article, but it seems trustworthy as I’m familiar with Koch’s work.)

What Can You Do with Validity?

A Follow-up on my last post.

Are your measures valid across a range of concerns?  Improving validity will lead to improved actions, better frameworks for acting and ultimately improved performance. In example:

The turn of the century saw an increase in the expectations tied to measurement through such phenomena as “No Child Left Behind” or SAT test prep classes.  This has begun to change as colleges put less emphasis on SAT scores and I believe we’ll soon see similar changes in high stakes graduation tests.  Two observations:

  • While high expectations pose difficult challenges for assessment, most of the problems that resulted in less use of assessment are in the expectations placed in specific tests not in the capabilities of assessment in general.  It’s a hermeneutic problem.  The meaning of test scores was much narrower than were the expectations for assessment; a mismatch between the meaning that was required and the meaning that the test could supply.    From a narrow psychometric perspective involving external validity, these test were valid, but from other perspective (structural or consequential validity – see the previous post) they are found wanting.
  • People will still act and those actions will still require assessment and those assessments will still be made.  They will just be more casual, less observable, and even less valid that those made by high stakes tests.

Most actionable situations require a range of assessments that are valid across a range of validity concepts.  Just because some are less empirical or more qualitative does not mean they should not be considered in an appropriate mix.