What to Do about Testing: A Response to Audrey Waters

Audrey Waters posted about John Oliver’s takedown of testing and Pearson Ed.  She asks:

How do we seize the opportunity of all this media attention to the problems with standardized testing to do more than talk about testing?  . . . Can we articulate (a better alternative) now so that Pearson and other testing companies don’t replace the old model with simply a re-branded, repackaged one?

Samuel Messick was a Vice President and Distinguished Research Scientist at Educational Testing Services (ETS).  His was an authoritative voice on test validity advocating for restraint in the use of test scores, better and more in-depth interpretations of test score. the collection of multiple sources of information for making important decisions and for consideration of the consequences of test use.  I believe that much of his legacy has been ignored, co-opted, or argued away (even at ETS I suspect).  I’ll speculate on what would he advocate;

  • using more than one or two sources of information when making complex important decisions,
  • understanding the information in the context of a decision and considering the consequences of your testing practices.
  • I also suspect that I could argue with him for the consideration of the validity of testing practices with how it fit within an overall set of district practices.  (i.e. If a student fails, how do you respond?)

Technically Pearson may not be at fault for it is the district use of tests that is most problematic, but Pearson is at least implicit in not providing better guidance and for developing ways for districts  to collect other sources of information.  Eg. The value added model of teacher assessment needs many more sources of information and in fact does not really provide an assessable model of pedagogy, only largely discredited positivist assertions. The first step is to expose those who advocate positivist models of empiricism for which even analytic philosophers would no longer advocate.

Finally it necessary to look at the overall model of education which is still primarily built of a mechanistic metaphor with the student as a vessel to be filled.  The metaphor should be a biological organism adapting in an environment that is primarily social, networked and interactive.  When Pearson speaks of their “potential game-changer: performance tasks”, they are talking in this direction, but their really co-opting performance tasks within the old metaphor.  They have a long way to go.  We should expunge the mechanistic metaphor from educational leadership and assessment models.

The bottom line for Pearson

You may not be technically wrong in your assessments, but when your the brunt of a comedic takedown, you should really look at the consequences of your products use and attempt to deal with it.

Understanding Paul Meehl’s Metascience: Pragmatism for Empirical Social Science

Some recent involvement in LinkedIn conversations has led me to delve more into Paul Meehl’s work in Philosophy of Science or what he referred to as scientific metatheory.  As the book A Paul Meehl Reader notes, Paul’s essays were painstakingly written and most readers do not read his work so much as they mine his work for insights over many years; so I suspects this will be a long term project.

Here is the first nugget: progress in the soft sciences is difficult and painstaking and much of the existing research work mat be flawed and found wanting. Here are some reasons:

  1. Theory testing often involves derived auxiliary theories which, if not highly supported themselves, will add unknown noise into the data.  Often these theories are also not spelled out or understood.
  2. Experimenter error, experimenter bias, or editorial bias is present more often than is generally acknowledged or even known or considered.
  3. Inadequate statistical power.  In general, much more power is needed.  Meehl thinks that we should often seek statistical power in the .9 range in order to overcome unknown noise (error) in the data.
  4. Seriously accounting for the crud factor (the possible effect of ambient correlational noise in the data).
  5. Unconsidered validity concerns.  The foundation of science is measurement, but often the validity of measurement tools are not considered seriously.  Experiments are often measuring things is new ways even if they are using well studied instrument and this requires analysis for validity.

What this means is that more methodological care is needed such as:

  1. Seeking predicted point values that are stronger in terms of falsification and lend more verisimilitude than the often weak corroboration that come from non-null significance testing.
  2. More power (i.e. .9) in hypothesis testing to protect against weak auxiliaries, unknown bias and general crud.
  3. Understanding the difference between statical significance and evidentiary support.  Observations are evaluated in terms of statistical hypotheses and are a statistician’s concerns about the probability of the observations.  But theories are evaluated by the accumulation of logical facts.  These are not evaluated in terms of probabilities, but in terms of verisimilitude.
  4. Science should seek more complete conceptual understanding of the phenomena under study.

I believe this last point is similar to Wittgenstin’s concerns that in science problem and method often pass one another by without interacting.  I think this concern is also similar to verisimilitude in theory. Verisimilitude maybe considered a fussy interpretive concept, but the problems uncovered by the Reproducibility Project show that hard sciences are not as interpretive free as is often supposed. I’m also coming to the conclusion that it is in Meehl (and the like minded Messick) that traditional empirical science and pragmatism can be brought together.  It is the idea that a social constructivist approach must account for both the successes and the failures of empirical science if it is to move forward productively.  Meehl and Messick were not pragmatists, but I am saying that in dealing with the problems thay saw in empirical science, a critical pragmatic approach can be envisioned.  As Meehl along with Wittgenstein, Popper and maybe Lakotos are some of the best critics within the empirical sciences and building from their critiques seems like an interesting place to explore.



Unpacking Ontologically Responsible Assessment

In this post I want to unpack the term Ontologically Responsible Assessment mentioned in this post

Why Develop an Ontology:

An ontology defines a common vocabulary for researchers who need to share information in a domain. It includes machine-interpretable definitions of basic concepts in the domain and relations among them.  . . .   There is no one correct way to model a domain— there are always viable alternatives. The best solution almost always depends on the application that you have in mind .  Source

When people say that students need 21st Century skills, what they really mean is that they want to change their ontological commitments as to what students are, and to what they will become.  When we move from a mechanistic factory model of education to a dialogic networked model; we are really changing our ontological commitments from components in a machine to actors in a network.  Ontologies try to clarify questions about the nature of being and becoming a student in the context of educational practice.  I would add (to the typical information systems objectivist account) that an ontology in educational practice also involves recognizing that students are constituted by networked relationships and the various domain discourses within which they interact.  The main difference in this ontology is that (in contradistinction to most information systems ontologies) its organization is not hierarchal and behavioral, but rather contexted, networked and dialogic.  This doesn’t mean there is no place for hierarchal behavioral objectives, just that they no longer form the core of our educational goals.

Why Responsibility:

Depending on whether one believes that reality is objectively given or subjectively / collectively constituted, the understanding of responsibility will differ. This, in turn, has a serious impact on how individuals and collectives can or should use IS (Information Systems).  . . . Reality is thus not given and open to objective discovery but outcome of the intentional activity of perception and interpersonal communication. This means that the dimensions of responsibility must be discovered through communication. (Stahl, 2007 Available from Research Gate or Ontologies, Integrated Series in Information Systems Volume 14, 2007, pp 143-169)

Education can’t be conceived through objective behavioral description, rather, it is conceived in the context of conversational realities.  Students are not cogs in a machine, but are people and the conversational realities where we meet them involves commitments, requirements, privileges, and various other high level latent traits that defy easy objectification.  To be responsible is to jointly actualize an educational program.

What Do I mean by Assessment:

What is the purpose of educational assessment?  Wikipedia speaks about documenting knowledge, skills, attitudes, and beliefs.  Merriam Webster talks about making judgements.  Edutopia talks about assessment as a mechanism for instruction.  I want to focus on another aspects that may seem technical, but I believe gets to the heart of the matter.

What is it that we measure are latent constructs for the most part.  As Michael Kane (2013) frames it:

Test scores are of interest because they are used to support claims that go beyond (often far beyond) the observed performances. We generally do not employ test scores simply to report how a test taker performed on certain tasks on a certain occasion and under certain conditions. Rather, the scores are used to support claims that a test taker has, for example, some level of achievement in some domain, some standing on a trait, or some probability of succeeding in an educational program or other activity. These claims are not generally self-evident and merit evaluation.  Validating the Interpretations and Uses of Test Scores

More than anything else, assessment, at its core, is the process of estimating a latent trait and making it visible.  It is the first step in the analytic process of drawing connections, but we can’t connect the dots until they are visible to us.  There are 2 people for whom this is of primary importance: the teacher and the student.  If we observe the educational practices that involve testing, these are often the last 2 stakeholders that are given consideration, but they should be the first.


This 3 fold understanding of educational assessment includes developing an ontology where assessment practices recognize a full account of the being and becoming of students.  It does not restrict our view to what is easily measured, but essentially meaningless in the bigger picture or final analysis.  Secondly, it is responsible in that assessment is linked to an expectation for engagement that goes beyond behavioral description to fully recognize the full complexity of that student engagement as a dialogic and networked individual.  And finally, it does not use data in a mechanistic fashion, but uses construct measurement to make their joint responsibilities and ontologies visible to teachers and students in everyday educational practice.


Instructionism, Constructionism and Connectivism: Epistomologies and Their Implied Pedagogies

Ryan2.0’s blog recently hosted a discussion on different pedagogies based on Instructionist, Constructionist and Connectivist  theories of learning.  I tend to see these differences on an epistemological / psychological / psychometrics level.  (I’m an educational psychologist, not a philosopher.)  I think this line of thinking is helpful for exploring some of my recent thoughts.

First a note; I resist labels on learning theories.  A consensus may be developing, but there are so many sub-positions that if you look at 100 constructivist positions, you’ll find 100 different takes (as evidenced by many of the comments on Ryan’s post).  I just find labels unsatisfying as points of reference for communication in learning theories at this time; they convey too little meaning to me.  Tell me what you don’t like about a learning theory; I probably don’t like it either.

What’s the Point

Ryan’s main point is that all of these pedagogical position are evident in current education practices and we should think in terms of “and” not “or”.  This fits with my own view that paradigm shifts should proceed by subsuming or at least accounting for the successful parts of the previous paradigm, while enabling teachers and scientists to move beyond problematic aspects of older theories.  To really understand these different theories, it will be good to see how pedagogy changes as we move from one to the next.  My post here looks at each one of these different theories in terms of epistemology / psychology / psychometrics, and than discuss a place where implied pedagogies are relevant to practice today.

Direct Instruction

I’m not familiar with instructivism per say, but it seems similar to direct instruction, a pedagogy that is associated with positivism / behaviorism.  Direct instruction often uses empirically based task analyses that are easy to measure and easy to employ.  Applied Behavioral Analysis is a specialized operant behavioral pedagogy that is a prime supporter of direct instruction.  Many, if not most classroom use direct instruction in some form today.  It seems like common sense and many teachers may not be aware of the underlying epistemology.

One prominent area where advanced uses of direct instruction is growing is in computer based adaptive learning like the Knewton platform. Students follow scripted instruction sequences. A student’s specific path within the script is determined by assessments that follow Item Response Theory (IRT) protocols.  The assessment estimates a student’s command of a latent trait and provides the next instruction that is appropriate for the assessed level of that trait.  The best feature of Adaptive learning systems is the efficiency in moving students through a large body of curriculum or in making leaps in skill levels like the improvement of reading levels.  Because it is also easy to measure, it’s possible to use advanced psychometric computer analyses.

Critiques of direct instruction can be similar to critiques of behaviorism in general.  Even though test developers are becoming more sophisticated in measuring complex constructs (eg. Common Core), the learning that results from direct instruction can still be seen as lacking in conceptual depth and in the ability to transfer to other knowledge domains.  It also doesn’t directly address many important higher level cognitive skills.


Enter constructivism.  I think of constructionism as beginning with Piaget’s learning through schema development.  Piaget’s individual constructive approach is expanded by social theorists and ends up with embodied theorists or in ideas similar to Wittgenstein’s; that knowledge and meaning are closely linked with how they are used.  Wittgenstein’s early work was similar to the work of logical positivists.  He eventually found that meaning in everyday activities is inherently circular and the only way to break out is not through precision, but to look for meaning in what people are doing and how they are using knowledge.  In some ways it’s like a return to behaviorism, but with a position that is more inline with hermeneutics than empiricism.

I recently saw a presentation of an instructional program (MakerState) based on the Maker / Hacker Space movement that functions much like a constructivist approach to education.

MakerState kids learn by doing, by creating, designing, experimenting, building…making. Our makers respond when challenged to think outside the box, to think creatively and critically, to collaborate with their peers, to problem solve, to innovate and even invent solutions to challenges they see around them.

This program can be founded on the same curriculum as that used in direct instruction when developing maker challenge activities and it can use this curriculum to scaffold maker activities with STEAM principles.  But the outcomes are open ended and outcome complexities are well beyond what is capable through direct instruction.  Learning by doing is more than just an aside.  Making knowledge concrete is actualizing it; taking it from the abstract to make it meaningful, valuable and productive.  But, is this the end of educational objectives; does success in life not require even more.


Enter Connectivism.  I associate connectivism with the work of  George Siemens and Stephen Downs.  I take this post from George as a good summary of Connectivism:

The big idea is that learning and knowledge are networked, not sequential and hierarchical.  . . . In the short term, hierarchical and structured models may still succeed. In the long term, and I’m thinking in terms of a decade or so, learning systems must be modelled on the attributes of networked information, reflect end user control, take advantage of connective/collective social activity, treat technical systems as co-sensemaking agents to human cognition, make use of data in automated and guided decision making, and serve the creative and innovation needs of a society (actually, human race) facing big problems.

I believe this take on Connectivism is modeled on computer and social media networks.  My own take is to include a more biological approach as another major node in connectivism: M.M. Bakhtin, a Russian literary critic known as a dialogic philosopher.  I want to draw this connection because dialogism is a reasonable way to make sense of everyday collective co-sensemaking activity by an organism interacting with its environment.  I see this as understanding the underlying way networks function when biological organisms (i.e., humans) are involved.

One of Bakhtin’s main ideas is heterglossia:

(A)ll languages (and knowledges) represent a distinct point of view on the world, characterized by its own meaning and values. In this view, language is “shot through with intentions and accents,” and thus there are no neutral words. Even the most unremarkable statement possesses a taste, whether of a profession, a party, a generation, a place or a time.  . . . Bakhtin goes on to discuss the interconnectedness of conversation. Even a simple dialogue, in his view, is full of quotations and references, often to a general “everyone says” or “I heard that..” Opinion and information are transmitted by way of reference to an indefinite, general source. By way of these references, humans selectively assimilate the discourse of others and make it their own.

Just as water is the medium that allows fish to swim, language is the medium that facilitates networks.  Rather than focus on words as the base unit, Bakhtin focusses on the utterance as his main unit of analysis.  This is from the main wikipedia Bakhtin article:

Utterances are not indifferent to one another, and are not self-sufficient; they are aware of and mutually reflect one another… Every utterance must be regarded as primarily a response to preceding utterances of the given sphere (we understand the word ‘response’ here in the broadest sense). Each utterance refutes affirms, supplements, and relies upon the others, presupposes them to be known, and somehow takes them into account…

I see this as a detailed account of the Wittgenstein use argument that I used earlier.  I take from a psych perspective: The inner psychological world reflects and models the interaction we have with the world.  Because learning is facilitated by social interaction with other people in dialogue, our mind is structured in a dialogical fashion.  This is to see knowledge as existing not only through network nodes, but nodes that reflect dialogue and inter-connected utterances. (This is similar to structuralism, but goes well beyond it in its implications.) Even when we are learning through self study we structure that study in a dialogical fashion.  When we engage in soliloquy, we posit a general other to which we address our words.  Transferring knowledge is not just cutting and pasting it to another node in the network.  We must also adjust to new intentions, new references, and often to the tastes of a new profession or discipline.  I don’t know what the neurological correlates are to dialogic activity, but cognition at a conscious level (and some aspects of unconscious levels), I see the mind as structured by its interaction with this complex social / speech world.

I don’t yet have a good example of pedagogy that reflects this dialogic connective theory.  It would certainly be activity based and structured more like an open-ended apprenticeship and some sort of performance.  I’m thinking that some relevant learning objectives would include: higher order cognition in unstructured situations (e.g. knowledge transfer, problem identification and solving, creative thinking, situated strategic thinking),  intrapersonal dispositions (e.g. motivation, persistence, resilience, and metacognition like self-directed learning) and interpersonal skills sets (e.g. collaboration, effective situated communication, relationship development).

I think a key to achieving a higher level of connective pedagogy is valid assessment in an area where assessment has proven difficult.  Assessment in this context must also be ontologically responsible to the student.  The purpose of ontologically responsible assessment is not to rank, rate, or judge either students or teachers.  That is a task for other assessments. Instead, ontologically responsible assessment is a way of making ourselves visible, both to ourselves and to others, in a joint student teacher activity that conveys the students history and future horizons.  (Horizon = A future that I can see only vaguely, but contains a reasonable route to achieve, given both the student’s and teacher’s  join commitment to each other and to the path.  Education as a doable, visible, committed and ontologically responsible joint activity by student and teacher.

TI’m neven satisfied with an ending, but this seems like a good jumping off point for another post and another time.  I feel the need for input before going further in this direction.


Seeing Students Develop: From Objective Data to Subjective Achievement

Even though the personalization / individualization of instruction is being driven by objective data in learning platforms, this data can also be used to facilitate a deeper self-understanding  commitment and understanding between the student and the teacher.

To see the future, students and teachers should focus on their horizons.  Horizons here refer to a point in developmental  time that can’t be seen clearly today, but that I can reasonable expect to achieve in the future.  Because many aspects of this developmental journey are both precarious and dependence on future actions, this joint vision can’t be wishful thinking, but must be clearly framed in terms of privileges and obligations.  When it is treated this way, assessment is not a picture of student achievement, but is a methods for making both student and teacher visible to each other in a way that is rational, meaningful and conducted in an ontologically responsible manner; that is, in a way that is true to who we we want to become (Shotter, 1993).

This model of support begins with valid assessments that are clear and explicit about their  meaning, the underlying values implied and the actual or expected consequences.  The learning process can then be understood from a narrative perspective as well as mathematically.  By referencing empirically supported path models, personalization can include choice, preparing the way for stronger commitment and clarification of learning directions, choices and possibly experiments involving learning directions.

Theis idea is not to suggest that assessment must become less objective, but to recognize that an education process must contribute to the development of a subject.  Educating a student is not like designing a computer chip.  It is about helping an individual actualize their unique capabilities while finding themselves and their place in society.  The Goal of Education is intellectual development.  Approaches that are tethered to a mechanistic model of education will fail in this goal and are not even appropriate in terms of the efficiency by which they may be justified.  Assessment may start with objective visions, but its uses must directly translate to the subjective tasks that are central to both teacher and student.

A New Form for Validity

Thinking about new projects.  Here are the general contures of a new way of looking at validity.

  1. There have been criticism of Samuel Messick unified view of construct validity and Kanes Argument based approached.  I have yet to accept any logical argument made against either framework, yet I am sympathetic when it is said that these frameworks are not practical administratively.  
  2. Consider an argument made by the philosopher Karl Popper.  Popper makes a distinction between justification and criticism on the way to his famous idea of fallisficationism.  Just like one cannot claim that one’s theory is true through experimentation (you can only be sure of your results if they are false), so too it is precarious to justify one’s beliefs, but easy to demonstrate if their false.  Justification can be seen as a next to impossible task, but criticism is more likely to be seen as true.  If we respond to criticism with a desire to improve and adjust our beliefs than our beliefs will approach a closer version of what you might call truth.  So, the best way to justify assessment validity is by being open to criticism; always seeking to improve through critical reasoning.
  3. This does not nullify Messick’s framework (Messick, 1995), but it shifts it from justification to a framework for critique and critical thinking.  Messick’s framework moves from a hopelessly difficult attempt at justification and becomes a critical framework for knowledge transparency.  Recent developments in philosophy  have demonstrated the contingent nature and how the shape of knowledge is shaped by the form of its production.  Messick’s transparent critical framework for the production of assessment knowledge is the best way to see the underlying contingencies
  4. Kane’s framing of  validity as an argument is more suited to a critical approach than a justificationist approach.   The very nature of argument sets up a 2 sided dialogue.  Every argument presupposes a dialogic counter argument.  If you enter into an argument you must be willing to entertain and engage with critical position.  Kane’s framework is more suited to respond to critical than to depend on justification.

Rhetoric and Neuroscience

This post is in response to a LinkedIn discussion in the Metacognition Learning to Learn Discussion Group.  I made this statement to a participant:

I do dislike the way some people localize their skills (i.e. like saying I’m a right brained person) All activities use the whole brain; left and right. People who say they are “right brained” can also excel at many “left brained” activities and vice versa.

That participant responded saying:

Howard, what evidence do you have about the whole brain functioning? Can you please provide the scientific evidence or the anecdotal if there is such evidence?

And I am glad to respond which also forces me to elucidate and extend the grounding of my thoughts and you are right to ask for substantiation.  I will begin such an attempt here and welcome the opportunity to continue the conversation beyond this response.

Brain Systems, not Modules, as the  Basis for Complex Socially Relevant Behavior

#1 Studying of the localization of brain function is an important basis for neuropsychology, cognitive neuroscience, and many fmri studies.  Studying psychology in everyday function, however,  implies a different perspective; studying the brain as a system.  First, this is a different metaphoric take on the brain as stated by Churchland in this Scientific America article.

University of California, San Diego, philosopher of the mind Patricia S. Churchland . . . (states) “There are areas of specialization, yes, and networks maybe, but these are not always dedicated to a particular task.” Instead of mental module metaphors, let us use neural networks”.

Kevin N Ochsner and Matthew D Lieberman similarly state that the study of the human functioning requires the interdisciplinary study of brain systems (The Emergence of Social Cognitive Neuroscience, (2001). American Psychologist, 56 (#9), 717-734.)  They advocate for combining bottom-up studies (neuroscience) with top-down approaches (social cognitive).

As the field develops, one can expect a shift in the kinds of studies being conducted.  When little is known about the neural systems involved in a given form of behavior or cognition, initial studies may serve more to identify brain correlates for those phenomena than to test theories about how and why the phenomena occur.  . . . Ultimately, it will be important to move beyond brain-behavior correlations, but this can only happen when researchers in the field have built a baseline of knowledge about the brain systems underlying specific types of social or emotional processing (p.725).

(C)ognitive neuroscientists have historically used minimalist methodologies to study a few basic abilities with  little concern for the personal and situational conditions that elicit and influence them (bottom-up). . . . social psychology has historically been interested in a broad range of complex and socially relevant phenomena (top-down). . . In recent years, there has been increasing appreciation that top-down and bottom-up approaches cannot be researched independently because they are intimately linked to one another (p. 727-728).

The Effects of Pop-Psychology

# 2 Neuropsychological concepts and brain localization theories have entered mainstream pop-psychology, but these are complex topics and pop-psychology often miss-understands and miss-appropriates these concepts when they are used.  In example, someone recently told me that they aren’t right-brained types. What they mean is that they are effectively ceding what I view as a pan-human ability to be creative.  (This, in someways, is similar to Carol Dwick’s growth verses fixed intelligence argument.)  Neuropsychological tests may provide insight in clinical situations, but I believe the application of these insights to everyday activity should be my made with great care and under clinical supervision.  Similarly, fMRI studies provide great insight about the brain, but it is beyond scientific validity to apply many of these insights to complex social behavior.

Using Neuroscience for Rhetorical Purposes

# 3 I have read a couple of books that claim they are based on brain research, my latest read is Charles Jacob’s Management Rewired.  I generally like many of the ideas in this books (though most of the hard scholarly work remains to be done), but my regard is based on educational theory, not neuroscience.  When these type of books refer to neuroscience, I see them stretching beyond valid interpretations of the underlying science and their reference of neuroscience seems to be used mostly for rhetorical purposes.  To illustrate through another example, the general population’s belief in science is less skeptical than most scientists.  Hence, many newspaper articles will use the rhetorical device “studies show” to lend the authority of science to their views rather than allowing their views to stand on their own merit or by presenting actual scholarly work to support their position.  Similarly, I find that neuroscience is often similarly used rhetorically, to lend authority inappropriately.

Instead of Neuroscience, Look to Cognitive Mediation and Vygotsky’s Higher Mental Functions

Traditional psychology speaks to many of these issues.  For myself, I find that mind maps, graphic organizers and visual design processes help me to get ideas out into visual space and overcome cognitive limitations in my short-term memory.  This quote by D.A. Norman (1994) states it well. (Things that Make Us Smart)

Without external aids, memory, thought, and reasoning are all constrained.  But human intelligence is highly flexible and adaptive, superb at inventing procedures and objects that overcome its own limitations.  The real power come from devising external aids that enhance cognitive abilities (p.24).

Vygotsky called these aids mediators and their use, examples of higher mental functioning.  Vygotsky was interested in human functions which exist on a different level from natural or biological ones.  It is my belief that the complete neurological correlates of human functioning and thinking will not be found because human functioning is located in cultural settings.  “Individual consciousness is built from the outside through relations with others”. (From Alex Kozulin’s introduction to Vygotsky’s (1934) Thought and Language).

Existing educational and psychological theory is adequate for improving human functioning and should serve as the basis of support until Social Cognitive Neuroscience development can be extended to functional activity.

Validated Methodological Operationism: Improve Analytics by Validating Your Operations

Many measured processes can be improved by validating your process operations.  This is true whether your are talking about business, experimental, or educational processes.

A New View Of Operationism

Interesting read on operationism by Uljana Feest – (2005)  Operationalism in Psychology: What the Debate is About, What the Debate Should Be About [Journal of the History of Behavioral Sciences, 41(2) 131-149].

The basic gist: Psychologist’s historical use of operationalism was methodological rather the positivist (even though they may have referenced positivism for philosophical cover).  So criticizing operationism using positivist arguments is somewhat misguided, but operations can be criticized through validation arguments.

What does Feest mean by a methodological reading of operationism?

. . . I mean that psychologists did not intend to say, generally, what constitutes the meaning of a scientific term.  . . . in offering operational definitions, scientists were partially and temporarily specifying their usage of certain concepts by saying what kind of empirical indicators they took to be indicative of the referents of the concepts (p. 133).

She concludes by saying:

. . . the debate should then be about what are adequate concepts and how (not whether) to operationalize them, and how (not whether) to validate them (p.146).

So any debate about operationism is really about constructs and their validation.  Within this framework, I will list 4 specific types of operationism.

Positivists, Empirist Operationism

This idea can be represented by Percy Bridgman’s original conception of operationsim

in general, we mean by a concept nothing more than a set of operations; the concept is synomonous with the corresponding set of operations (Bridgeman, P.[1927]. The logic of Modern Physics, Macmillan:NY. p.5).

The biggest problem with this approach is that any set of operations can never be said to exhaust the entirety of meaning in any construct, a position that is also supported by cognitive psychology’s understanding of cognitive processes in the meaning and use of concepts (Andersen, H., Barker, P & Chen, X. (2006). The Cognitive Structure of Scientific Revolutions, Cambridge University Press).

Methodological Operationism

The idea that operations are the empirical indicators of the construct (Feest).

Naive Pragmatic Operationism

Regardless with how you conceive of a construct, within any measured process, no matter if that process is an experimental, business or any other process that is controlled by measures, those measurement operations are methodologically defining that construct in the function of that process.  If you throw any measure in place without determining how and why you are using that measure, you are operating in the same fashion as any operationists in the positivist empiricist mode and you are subject to the same kinds of problems.  Garbage in = garbage out; this is the real potential problem with this approach.  There are many business process that do not meet their expectations and those problems can be traced back to poor quality measurements whose construct are not appropriately operationalized.

Validated Methodological Operationism

This represents measured processes whose operations are clear and whose quality and validity has been adequately evaluated.


Feest references the gap between qualitative and quantitative research as being about operationism.  I believe this is incorrect.  Operationism is about construct validity (unified theory).  Criticism of qualitative research is usually about research validity (a different validity) and the value of different research purposes.

Managing Relationships, Supporting Performance

Interesting NY Times opinion piece today that extends my recent posts on validity and performance management.  It is entitled: Why Your Boss Is Wrong About You, By Samuel Culbert.  In that article he states:
In my years studying (performance) reviews, I’ve learned that they are subjective evaluations that measure how “comfortable” a boss is with an employee, not how much an employee contributes to overall results.
Samuel Culbert in this statement leads us to believe the problem with performance reviews is their subjective nature.  From a measurement perspective I believe this is incorrectly stated.  The problem is one of validity, that performance measurements typically measure the bosses level of  comfort with an employee, not their performance.  Greater objectivity will not help us if comfort is still the construct being measured.  Instead we must look at validity.
Culbert proposes a performance preview process as an alternative.
It’s something I call the performance preview. Instead of top-down reviews, both boss and subordinate are held responsible for setting goals and achieving results. . . . bosses are taught how to truly manage, and learn that it’s in their interest to listen to their subordinates to get the results . . . “Tell me your problems as they happen; we’re in it together and it’s my job to ensure results.” . . . . It encouraged supervisors to act as coaches and mentors.  . . . But understand that the performance review makes it nearly impossible to have the kind of trusting relationships in the workplace that make improvement possible.
This preview process may be a good idea in and of itself, but it does not logically get at the root problem.  Measurement within this new process can have just as many validity problems as the old process.  This is why validity is important.
Two additional things I’m thinking:
Culbert doesn’t quite get there, but I sense he is looking at something like Action Analytics, measurement tied to real-time feedback that can support performance while performance is still in formation.  Instead of measurement in the service of performance review it is measurement in the service of performance support.
The second thought is this.  The central point in Culbert’s process is trust between an employee and a boss, that is, a management relationship.  This is the central construct in successful management and it may be important to measure.
This leads me to an interesting final conclusion and maybe a management axiom:
In managing people, we management relationships, but support their performance.

Avoiding Naive Operationalism: More on Lee Cronbach and Improving Analytics


Consider again Cronbach and Meehl’s (1955) quote from my last post.
We do believe that it is imperative that psychologists make a place for (construct validity) in their methodological thinking, so that its rationale, its scientific legitimacy, and its dangers may become explicit and familiar. This would be preferable to the widespread current tendency to engage in what actually amounts to construct validation research and use of constructs in practical testing, while talking an “operational” methodology which, if adopted, would force research into a mold it does not fit.  (Emphasis added)
What was widespread in 1955 has not substantially changed today.  Construct measures are routinely developed without regards to their construct or consequential validity, and it is in detriment to our practices.  I will name this state, naive operationalism; measuring constructs with what amounts to an operational methodology.  I will also show why it is a problem.

Operational Methodology: Its Origins as a Philosophical Concept

What do Cronbach & Meehl mean by an operational methodology?  Early in my psychological studies I heard the definition of intelligence stated as “that which is measured by an intelligence test”.  It was an example of operationalism (or operationism). Originally conceived by a physicist named Percy Bridgman, operationalism conceptually states that the meaning of a term is wholly defined by its method of measurement.  It became popular as a way to replace metaphysical terms (eg. desire or anger) with a radical empirical definition.  It was briefly adopted by the logical positivist school of philosophy because of its similarity to the verification theory of meaning. It also became popular for a longer time period in psychology and the social sciences.  Neither use stood up to scrutiny as noted in Mark Bickhard’s paper.
Positivism failed, and it lies behind many of the reasons that operationalism is so pernicious: the radical empiricism of operationalism makes it difficult to understand how science does, in fact, involve theoretical and metaphysical assumptions, and must involve them, and thereby makes it difficult to think about and to critique those assumptions.
Not only does the creation of any measurement contains many underlying assumptions, the meaning of any measurement is also a by-product of the uses to which the measurement is put.  The heart of validity theory in the work of Cronbach (and also in Samuel Messick), is in analyzing various measurement assumptions and measurement uses through the concepts of construct and consequential validity.  Modern validity theory stands opposed to operationalism.

Operational Definition as a Pragmatic Psychometric Concept

Specifying an operational definition of a measure is operationalism backwards.  Our measurements operationalizes how we are defining a term, not in the abstract, but in actual practice.  When we implement a measurement in practice, that measurement effectively becomes the construct definition in any processes that involves that measure.  If the process contains multiple measures, it is only a partial definition.  If it is the sole measure, it also becomes the sole construction definition.  Any measure serves as an operational definition of the measured construct in practice, but we don’t believe (as in operationalism) that the measures will subsume the full meaning of any construct.  Our operational definition is no more than a partial definition and that is why consequential and construct validity are needed in our methodological thinking.  Validity research tell us when our operational definitions are problematic and may give us indication as to how to make improvements to our measures.  Validity research studies the difference between our operational definitions and the construct being measured.

Naive Operationalism

For most of us, operationalization outside the larger issue of a research question and conceptual framework is just not very interesting.
I could not disagree more! Not including validity in our methodological thinking will mean that our operationalized processes will result in what I will call naive operationalism.  If we devise and implement measures in practice, without regard for their validity, we will also fail to understand any underlying assumptions and will be unable to address any validity problems.  In effect, it is just like philosophical operationalism and sets us up for the same problems. Lets consider a concrete example to see how it can become a problem.

An Example of Naive Operationalism

Richard Nantel and Andy Porter both suggests that we do away with Performance Measurement, which is considered “a Complete Waste of Time”.  These are the reasons given for scrapping performance measurement:
  1. Short term or semiannual performance  reviews preventing big picture thinking, long-term risk taking and innovation. We want employees to fail early and often.
  2. Performance systems encourage less frequent feedback and interferes with real-time learning.
  3. Compensation and reward systems are based on faulty  incentive premises and undermining intrinsic motivation.
  4. There’s no evidence that performance rating systems improve performance.
Consider each reason in turn
  1. This critique is advocating for a different set of constructs.  True, the constructs they imply may not be common to most performance measurement systems, but there is no reason to stay with standard constructs if they are not a good fit.
  2. There is no reason why formative assessments like action analytics and other more appropriate feedback structures could be a part of any performance improvement systems.
  3. This is another instance where it appears that the wrong constructs, based on out of date motivational theories, are being measured.  They are the wrong constructs and therefore the wrong measures.
  4. The consequences of any measurement systems is the most important question to ask.  Anyone who doesn’t ask this questions should not be managing measurement processes.


What is the bottom line?  There is nothing Richard or Andy point out  that would make the concept of performance measurement wrong.  The measurement systems they describe are guilty of naive operationalism.  The idea that any specific measure of performance is the sole operational definition needed and this is true even they are unaware of what they are doing.  No!  We should assess the validity of any measurement system and adjust according to an integrated view of validity within an appropriate theoretical and propositional network as advocated by Cronbach and Meehl.  Measurement systems of any kind should be based on construct and consequential validity, not an operational methodology, whether it is philosophical or naive.