Evidence-based Testing (Assessment)

I confess; I love the 35,000 foot view.  An article by an old pro, who gives us their overview of the future of their field.  This is Howard Wainer’s, 14 Conversations About Three Things.  His intended audience are researchers of the 21st Century.  His three things are what skills will they need (see #1 below), what problems are worth investigating (See #2) and what topics are not (See #3).

What jumped out at me was the topic, Evidence-based Testing (EBTD) and the premiss behind his recommendation.  (I have more study to do, but EBTD seems to be testing designed with validity in mind.)  His premiss is that statistical analysis has been very well researched and we can get more bang for the buck by focusing on improvements in test design.  We have done a better job improving data analysis than we have in data collection.  I think this premiss holds true across society (education, business, science etc. . . ).  We are generally better at analysis than we are at data collection.  In many cases it is garbage in – garbage out.  It’s not that analysis is unimportant, it’s just that the easiest way to improve analysis is in improving the data / information that forms the basis of analysis.  How do we do this?  By designing measures with greater validity.

  1. Six skills needed by 21st Century Researchers: Bayesian Methods, (Modeling) Causal Inference, (Dealing with) Missing Data, (Graphic representation for) Picturing Data, Writing Clear Prose, A Deep Understanding of Type I and Type II Errors.
  2. Important topic for 21st Century investigation: Evidence-based Test Design, Value Added (statistical) Models, New Kinds of Data (mostly made possible by computer networks)The integration of Computerized Adaptive Testing, Diagnostic Testing and Individualized Instruction.
  3. Topic that can be given a rest: Differential Item Functioning, The Rasch Model, Factor Analysis / Path Models, New Measures of Reliability.


      Wainer, H, (2010). 14 Conversations About Three Things, Journal of Educational and Behavioral Statistics, 35 (#1) 5-25.

      Channeling Pandora: Ideas from a 2007 Interview with Management Professor Bill Starbuck

      Reading through documents from the Stanford Evidence-based Management Blog Site, I came across an interesting article (Un)Learning and (Mis)Education Through the Eyes of Bill Starbuck: An Interview with Pandora’s Playmate (Michael Barnett,(2007) Academy of Management Learning and Education, 6, 1, 114-127).

      Starbuck seems to be concerned with two things: (1) methodological problems in research and (2) un-learning or fossilized behavior in organizations.
      On methodology:  You can’t just apply standard statistical protocols in research and expect to get good answers.  You must painstakingly build a reasonable methodology fitting your methods to contexts and tasks, much like you are fitting together the pieces of a puzzle.  In a personal example: I consulted practically every text book I had when developing methods for my dissertation, but the most common were introductory statistical texts.  I kept asking myself: what am I doing, what are the core statistical concepts I need to do this, how can I shape my methods to fit my tasks to these core concepts.  Almost all advanced statistical techniques are an extrapolation of the concepts found in introductory statistics and your can’t really understand how to use these advanced procedures until you understand their core and how they fit your methodological circumstances.  As Starbuck points out, the concept of statistical significance is the beginning of results reporting, not the end.  You must go on to build a case for substantive importance.  He points out that mistakes are common in reporting effect sizes.  I believe that this often happens because people simply apply a statistical protocol instead of understanding what their statistic are doing.

      A favorite issue of mine (that Starbuck implies, but does not address directly) is the lack of a theoretical framework.  Without theory, you are flying empirically blind.  Think of the four blind men empirically describing an elephant by holding a trunk, leg, body and tail.  Vision (or collaboration) would have allowed the men to “see” how their individual observations fit together as a hole.  You must begin with the empirical, but reality is always larger than your empirical study and you need the “vision” of a theoretical framework to understand the larger picture and how things fit together.  Theory is thus an important part of your overall methodological tact.

      On (Un)Learning: Starbuck discusses the need to unlearn or to change organizational processes in response to the changing environment.  It is a problem where past success cloud your vision obscuring the fact the what worked before is no longer working.  The problem is not that people can’t think of what to do to be successful, it’s that they already know what to do and their belief keeps them from seeing that there even is a problem or seeing the correct problem.  Starbuck talks about problem solving in courses he taught.  He found that people often found that the problem they needed to solve was not the problem they initially had in mind.  Their biggest need was to change beliefs and perspectives.

      The psychologist Vygotsky spoke of something very similar as fossillized behavior.  As someone is presented with a unique problem, they must work out the process solutions to the problem externally.  Later the process is internalized to some extent and become somewhat automated, requiring much less cognitive load.  After more time this can become fossilized, that is, behavior that is no longer tied to a process or reason, but continues as a sort of tradition or habit.  This would apply at the organizational level as well as the individual posychological level.  I would like to investigate the concept of organizational resilience as a possible response to fossilized organizational behavior as well as a way of responding to extreem events.  This would  emphasize an ability to change in response to environmental demands.  Starbuck thinks that constant change is too disruptive to organizations, but I believe that there may be a combination of processes, capabilities and diversity that enable organizations to sense and respond, not necessarily constantly, but reasonably on an ongoing basis as well as when the enevitable  black swan happens.

      Beware the Statistical “Dragon King”

      A power law describes a relationship between 2 quantities, typically between event frequency and event size where the larger the event, the smaller the frequency.  (Think of a long-tail distribution)  Recently, large events have been referred to as black swans, rare large improbable events (Talub, 2007).  Predicting black swans is difficult because there are too many variables and unknowns in prediction, but their effect size make them too problematic to ignore.

      The Physics arXiv Blog recent discussed a proposition by Didier Sornette at the Swiss Federal Institute of Technology.  Sornette says that outsized outliers are more common then they should be in power distributions because they are subject to feedback loops.  He termed these outliers dragon kings.  In real life examples (at lest it seems to me that) these feedback loops seem to often be social; an example of jumping on the bandwagon.  This is another reason that black swans are much more common than they should be according to power laws.

      Very relevant for risk management calculations.  If you are preparing for potential risks, beware not only for black swans (rare events with large effects that are hard to predict because you don’t know the future of many varibles), but also for dragon kings (feedback loops that increase the effect size of somewhat rare events, making them more common that a power law distribution would expect).  It provides a rationale for the development of resilient organizations, the ability to change quickly in response to environmental events, instead of relying on cost probability decision matrixes.