Category: book

Notes from the ebook: The Science of Evaluation: A Realist Manifesto

This is the external link to the e-book.

Chapter 1

As Bhaskar puts it, ‘Theory without experiment is empty. Experiment without theory is blind’ (1978, 191).

Society is made by, but never under the control of, human intentions.

Evaluation has traditionally been asked to pronounce on whether a programme makes a difference ‘beyond that which would have happened anyway’. We always need to keep in mind that what would have happened anyway is change – unavoidable, unplanned, self-generated, morphogenetic change.

Realist evaluation is a form of theory-driven evaluation. But its theories are not the highfalutin’ theories of sociology, psychology and political science. Indeed, the term ‘realistic’ evaluation is sometimes substituted out of the desire to convey the idea that the fate of a programme lies in the everyday reasoning of its stakeholders. Good evaluations gain power for the simple reason that they capture the manner in which an awful lot of participants think. One might say that the basic currency is common-sense theory.

However, this should only be the starting point. The full explanatory sequence needs to be rooted in but not identical to everyday reasoning. In trying to describe the precise elbow room between social science and common sense one can do no better that to follow Elster’s thinking. He has much else to say on the nuts and bolts of social explanation, but here we concentrate on that vital distinction, as mooted in the following:

Much of science, including social science, tries to explain things we all know, but science can make a contribution by establishing that some of the things we all think we know simply are not so. In that case, social science may also explain why we think we know things that are not so, adding as it were a piece of knowledge to replace the one that has been taken away. (2007: 16)

Evidence-based policy has become associated with systematic review methods for the soundest of reasons. Social research is supremely difficult and prone to all kinds of error, mishap and bias. One consequence of this in the field of evaluation is the increasingly strident call for hierarchies of evidence, protocolised procedures, professional standards, quality appraisal systems and so forth. What this quest for technical purity forgets is that all scientific data is hedged with uncertainty, a point which is at the root of Popperian philosophy of science.

What is good enough for natural science is good enough for evidence-based policy, which comes with a frightening array of unanticipated swans – white, black and all shades of grey. Here too, ‘evidence’ does not come in finite chunks offering certainty and security to policy decisions. Programmes and interventions spring into life as ideas about how to change the world for the better. These ideas are complex and consist of whole chains of main and subsidiary propositions. The task of evaluation research is to articulate and refine those theories. The task of systematic review is to refine those refinements. But the process is continuous – for in a ‘self-transforming’ world there is always an emerging angle, a downturn in programme fortunes, a fresh policy challenge. Evidence-based policy will only mature when it is understood that it is a continuous, accumulative process in which the data pursues, but never quite draws level with, unfolding policy problems. Enlightened policies, like bridges over swampy waters, only hold ‘for the time being’.

Chapter 2

It has always been stressed that realism is a general research strategy rather than a strict technical procedure (Pawson and Tilley, 1997b: Chapter 9). It has always been stressed that innovation in realist research design will be required to tackle a widening array of policies and programmes (Pawson, 2006a: 93–99). It has always been stressed that this version of realism is Popperian and Campbellian in its philosophy of science and thus relishes the use of the brave conjecture and the application of judgement (Pawson et al., 2011a).


Notes from book: Guide to health informatics

In the information sciences the definitions below are the very foundation of informatics: p. 13

Data consists of facts. Facts are observations or measurements about the world. For example, ‘today is Tuesday’

Knowledge defines relationships between data. The rule ‘tobacco smoking causes lung cancer’ is an example of knowledge. Such knowledge is created by identifying recurring patterns in data, for example across many different patients. We learn that events usually occur in a certain sequence, or that an action typically has a specific effect. Through the process of model abstraction, these observations are then codified into general rules about how the world works.

As well as learning such generalized ‘truths’ about the world, once can also learn knowledge that is specific to a particular circumstance. For example, we can create patient specific knowledge by observing a patient’s state over time. By abstracting away patterns in what is observed, one can arrive at specific knowledge such as ‘following treatment with anti-hypertensive medication, there has been no decrease in patient’s blood pressure over the last 2 months.

Information is obtained by the application of knowledge to data. Thus, the datum that ‘the patient’s blood pressure is 125/70 mmHg’ yields information if it tells us something new. In the context of managing a patient’s high blood pressure, using our general knowledge of medicine, and patient specific knowledge, the datum may allow us to draw the inference that the patient’s blood pressure is now under control.

How variations in the structure of clinical messages affect the way in which they are interpreted: p.36-43

What a message is meant to say when it is created, and what the receiver of a message understands, may not be the same. This is because what we humans understand is profoundly shaped by the way data are presented to us, and by the way we react to different data presentations. Thus it is probably as important to structure data in a way so that they can be best understood, as it is to ensure that the data are correct in the first place. What a clinician understands after seeing the data in a patient record and what the data actually show are very different things.

When sending a message, we have to make assumptions about the knowledge that the receiver has, and use that to shape our message. There is no point in explaining what is already known, but is equally important not to miss out important details that the receiver should know to draw the right conclusions. The knowledge share between individuals is sometimes called common ground.

The structure of a message determines how it will be understood. The way clinical data are structured can alter the conclusions a clinician will draw from data.

The message that is sent may not be the message that is received. The effectiveness of communication between two agents is dependent upon:

  • the communication channel which will vary in capacity to carry data and noise which distorts the message
  • the knowledge possessed by the agents, and the common ground between them
  • the resource limitations of agents including cognitive limits on memory and attention
  • the context within which the agents find themselves which dictate which resources are available and the competing tasks at hand.

Grice’s conversational maxims provide a set of rules for conducting message examples:

  • maximum of quantity: say on what is needed.
  • maximum of quality: make you contribution one that is true.
  • maximum of relevance: say only what is pertinent to the context of the conversation at the moment.
  • maximum of manner: avoid obscurity of expression, ambiguity, be brief and orderly.

Medical record’s basic functions: p.112

  1. provides means of communicating between staff who are actively managing a patient.
  2. during the period of active management of a patient’s illness, the record strives to be the single data access point for workers managing a patient. All test results, observations and so forth should be accessible through it.
  3. the record offers and informal ‘working space’ to record ideas and impressions that help build up a consensus view, over the period of care, of what is going on with the patient.
  4. once an episode of care has been completed, the record ultimately forms the single point at which all clinical data are archived, for long-term use.

The traditional way the EMR – record used in care is to be a passive supporter of clinical activity. An active EMR may suggest what patient information needs to be collected, or it might assemble clinical data in a way that assists a clinician in the visualization of a patient’s clinical condition. p.119

There are two quite separate aspects to record systems:

  • the physical nature of the way individuals interact with it
  • the way information is structured when entered into or retrieved from the system.

A summative evaluation can be made in three broad categories:

  1. a user’s satisfaction with the service
  2. clinical outcome changes resulting from using the service
  3. any economic benefit of the service

Technology can be applied to a problem in a technology-drive or a problem-driven manner. Information systems should be created in a problem-driven way, starting with an understanding of user information problems. Only then is it appropriate to identify if an how technology should be used.

Providing access methods that are optimized to local needs can enlarge the range of clinical context s in which evidence is used. p.177


AI systems are limited by the data they have access to, and the quality of the knowledge captured withing their knowledge base.

An expert system is a program that captures elements of human expertise and performs reasoning tasks that normally rely on specialist knowledge. Expert systems perform best in straightforward tasks, which have a predefined and relatively narrow scope, and perform poorly on ill-defined tasks that rely on general or common sense knowledge.

An expert system consists of:

  1. a knowledge base, which contains the rules necessary for the completion of its task
  2. a working memory in which data and conclusions can be stored
  3. an inference engine, which matches rules to data to derive its conclusions.

Notes from the book: Evidence-Based Health Informatics

Over past decades, the question of “can we computerize that process” has been inverted to “how can we optimize the business (or social) process?”p.4

All health IT systems affect patients. Some applications in a very direct way such as decision support systems; others such as computerized physician order entry or electronic prescribing do so by being a key part of the clinical process; but even scheduling systems and recall systems have patient effects through being tools which are depended upon to organised care, and which if malfunctioning or incorrectly operated will deprive patients of intended clinical interventions. P.5

EBHI (evidence based health informatics) means that the people designing, developing and implementing health information systems should be able to rely on an explicit evidence base derived from rigorous studies on what makes systems clinically acceptable, safe and effective. P.15

Table 1 Types of evaluation study p.17

Study type Motive for carrying out study Typical questions
1.       Formative evaluation How to improve an information system? Is it accurate? Is it safe? Will people use it? How to improve it?
2.       Summative evaluation Can the finished system solve a specific problem? Does this system work? How much does it cost? Will people use it?
3.       Principle-based evaluation Can this generic principle contribute to system design and EBHI? Does this general design principle make systems more usable, effective, safer, less expensive or more maintainable?

One dilemma is that while many design principles are generic, some other principles may be bound up in the context of the specific used, data items or the task they support. The concept of ecological user interface design supports this: for each work domain or environment we design a use interface that supports this, with all the relevant information formatted in the optimum way to support the task in hand. Realist approaches to evaluation and realist synthesis may have a place here to uncover what works, when, for whom and why. P.21

IT systems are integral to healthcare delivery and have a tremendous potential to bring about an overall improvement to patient safety. IT broadly includes all computer software used by health professionals and patients to support care. At the same time, use of IT, just like any other technology, can introduce new, often unforeseen, errors that can affect care delivery and can lead to patient harm. It is now widely recognized that problems with IT and their use can pose risks to patient safety. P.25

IT systems like EHR (electronic health record) facilitate access to patient information in a distributed manner. Using an EHR, patient information such as diagnoses, medications, and test results can be consolidated into a single system that can be accessed any time, in different localities, and by different team members. Wireless technology coupled with portable handheld devices allows clinicians to retrieve the most up-to-date patient information while on the move. This has the potential to significantly improve information sharing across the continuum of care, enhancing patient safety and coordination of care. P.26

The incidence of IT-related medication errors has been explored in several studies. P.28

The knowledge and skills of users are fundamental to safe use of IT. Training programs are thus essential and need to be appropriately tailored to the needs of different clinical seniorities and roles to ensure safe operation of systems. For example, training for a prescribing system that will be used by physicians, pharmacists and nurses will need to be tailored to the needs of each group respectively. Errors can also be generated when cognitive resources devoted to using a system are inadequate. A clinician’s workload plus environmental influences like distractions and interruptions can lead to errors. For example, when interrupted by a phone call a physician wrote a prescription for the wrong patient because they returned to the wrong record at the end of the call. P.30-31

I thought I can link the last paragraph with the “Fundamental Theorem” of Biomedical Informatics. It amplifies the idea that cognition and IT should work hand in hand and IT cannot replace cognition.



Notes from the book: Information and medicine, The nature of medical descriptions

There is a process in some way, that information alters knowledge. We have some information regarding something and after some time the “Eurika” phenomenon takes place; thus this information is transformed into knowledge, therefore this knowledge is what understanding consists of.

It is common sense the notion that in our brains there is a collection or a record of all our experiences. This record can be regarded as a “cognitive map” of the world, which is our representation of our world, or more accurately a representation of our knowledge. The primary concept of the connection between our world and our minds is the concept of perception. Not all our knowledge of the world is acquired through direct experience, since we learn of things vicariously from the reported experiences of other people.

The greater part of our knowledge of the world is acquired by direct experience. Knowledge by description is what makes possible the organized and cooperative activities of science and education.

The medical enterprise is commonly regarded as one that is “information-intensive”. The observations that physicians carry out, the decisions they make and the acts they perform depend upon information processes in an important way. The distinction between “knowing how” and “knowing that” has a particular relevance to medicine.

When knowledge is once made objective by its disclosure in public, it becomes an object of study for other minds. Procedures of verification become possible, methods for distinguishing belief from true knowledge emerge, the notion of consensus appears, and the possibility of science come into being.

He adopts the concept of “information” as a “thing” which, upon receipt, alters our knowledge-alters the mental representation we have, which is our map of the world. P 19-22

Necessary attributes are those which a nominal must have in order to qualify for class membership and to be entitled to a certain name and if we can show that a particular necessary attribute is not to be found in some particular object, we can disqualify it for membership or suggest it for a potential membership. There may be other sufficient attributes that have such properties that our finding of even one in an object will guarantee us that the object is a member of a particular class. The possession of necessary attributes confers upon an object the possibility of its membership in the class in question; a sufficient attribute confirms this. Deciding whether an attribute is sufficient requires a much greater amount of knowledge than deciding whether an attribute is necessary. P.75-77

Whenever we remove or eliminate an attribute we perform a generalization. We must take great care in deciding which attributes can be omitted if the intended meaning or the truth value of the statement is to be retained. Abstraction is undertaken in an attempt to simplify a situation or in order to describe or model some complex object or process. P.80-81

We commonly find that we use the word tall in describing a person. Yet there is no specification as to when a person is tall or medium height or short. Such fuzzy terms are easily understood in every day usage. As with ambiguity, we can use fuzziness to our advantage because of our common sense of the world. Fuzziness appears to enter into our descriptions of everyday object not through some accidental looseness of language but because of the intrinsic structure of the world itself. P.86-88

The medical record, which earlier served as the private diary of the attending physician and later as a “bulletin board” for the intercommunication of medical professionals, has now become an institutionalized and semi-public document.

“A medical information system is a set of formal arrangements by which facts concerning the health or health care of individual patients are stored and processed in computers”.

The cost-benefit analyses of complex computer-based information systems are particularly sensitive to the definitions employed for “cost” and for “benefit”.  On the cost side, there are a number of items that may be difficult to evaluate; for example the uncompensated time and effort of the medical and hospital staffs during planning and implementation. P.223-226

“It is no easy matter for a computer to replace a clerk”. P.227

To connect this with the “fundamental theorem of biomedical sciences”, the theorem amplifies the knowledge of the person-clerk not replacing the one.

When we wish to computerise information processes in a particular application area, the first step is usually to carry out what is called a systems analysis. This procedure has two goals: the process in question must first be identified and isolated, and its boundaries fixed. There is always this conceptual gap between the real processes of medicine, between what actually goes on and the descriptions of them which we may hope to store or process. We can speak of this as a gap in formalisation.  This discontinuity in formalisation between a manual medical information process and the machine code necessary to accomplish comparable ends begins at a very high descriptive level and it is not itself a concern of computer science. If this concern is to be given a name at all, it must be regarded as concerning medical applications and it is increasingly being referred to as “medical information science” in USA and as “medical informatics” in Europe. P.231-234


Notes from the book: Biomedical informatics: Computer applications in health care and biomedicine

Chapter 1:

There is a historical background with the conclusion that healthcare is slow to adapt to informatics.

“the inefficiencies and the frustrations associated with the use of paper-based medical records are now well accepted…” p. 4

EHR: Electronic health record

“EHR is best viewed not as an object or a product, but rather as a set of processes that an organisation must put into place supported by technology…”  && “clinicians are horizontal users of information technology…” p.  5

EHR makes data collection simpler, faster and easier.

We need better methods for delivering the decision logic to the point of care.

“the goal is to create an information-management infrastructure that will allow all clinicians, regardless of practice setting to use EHRs…..” p. 11

Issues that need addressing: p.12

  • Encryption of data
  • HIPAA- compliant policies
  • Standards for data transmission and sharing
  • Standards for data definitions
  • Quality control and error checking
  • Regional and national surveillance databases

Terminology history p.20-21

“Since 2006 biomedical informatics has become the most widely accepted term….” p. 21

“The term medical informatics is no longer used to refer to the discipline as a whole and is now reserved for those applied research and practice topics that focus on disease and the role of physicians.” p. 28

Next: Pharmacogenomics chapter 25

“aspects of biomedical information include an essence of uncertainty – we can never know all about a physiological process – and this results in inevitable variability among individuals” p. 32

Chapter 4:

Cognitive science is a multidisciplinary domain of inquiry devoted to the study of cognition and its role in intelligent agency. From the perspective of informatics, cognitive science can provide a framework for the analysis and modelling of complex human performance in technology-mediated settings. P.110

Chapter 7:

Standards are generally required when excessive diversity creates inefficiencies or impedes effectiveness. The health care environment has traditionally consisted of a set of loosely connected, organizationally independent units. Patients receive care across primary, secondary, and tertiary care settings, with little bidirectional communication and coordination between services. There is little coordination and sharing of data between inpatient care and outpatient care. P.212

A standard for coding patient data is nontrivial when one considers the need for agreed-on definitions, use of qualifiers, differing (application-specific) levels of granularity in the data, and synonymy, not to mention the breadth and depth such a standard would need to have. P. 214


The formal book reference is:

Shortliffe, E. H., & Cimino, J. J. (Eds.). (2013). Biomedical informatics: Computer applications in health care and biomedicine (4th ed.). United Kingdom: Springer London.


Notes from book: Evaluation methods in biomedical informatics

Chapter 1:

Term evaluation describes a range of data-collection activities designed to answer general or focused questions. P.1

Evaluation methods in biomedical informatics must address not only a range of different types of information resources, but also a range of questions about them from the technical characteristics of specific systems to their effects on people and organizations. P.2

According to him we use scholarly reason for performing evaluation. P.3

Question: he is using the word complexity; does he mean complexity or complicatedness?

What makes evaluation so difficult?

  • Problems deriving from biomedicine as a field of human endeavour, aka doctor-patient and diagnosis-treatment relationship p.6-11
  • Problems deriving from the complexity of computer-based information resources. If we want to verify a program by brute force trials, we need n! tries, where n is the number of input data items. Most information resources are deliberately configured or tailored to a given institution as a necessary part of their development; hence how can we compare results from multiple locations? P. 11-13
  • Problems of the evaluation process itself. The multiplicity of possible questions creates challenges for the designers of evaluation studies. P.14

Biomedical informatics is a complex, derivative field, which draws its methods from many fields, such as computer science, cognitive science, information science, statistics, linguistics and decision science. P.15-16

Chapter 2:

There is no single acceptable definition of evaluation. P.24

Evaluation is an empirical process. Data of varying shapes and sizes are always collected. Evaluation is viewed as an applied or service activity. Evaluations are tied to and shaped by the specific information resources under study.  Evaluation is useful to the degree that it sheds light on issues such as the need for, function of and utility of those information resources. P.25

Research vs evaluation p.27


Evaluation approach is a broader term than method, connoting the strategy directing the design and execution of an entire study. P.32

Are we using a decision-facilitation approach? P.33 “resolve issues important to developers and administrators so these individuals can make decisions about the future of the resource.” Or are we using an objectives-based approach? “if a resource meets its designer’s objectives” Or are we using the responsive/illuminative approach? “goal is understanding not judgment” p.35

Chapter 3:

Resource effect and impact can be studied; the focus switches from the resource itself to its effects on users, patients and healthcare organizations. P.52

Field user effect study, aspect: resource effect and impact, broad study question: does the resource change actual user behaviour in ways that is positive? Stakeholders: resource users and their clients, resource purchasers and funders. – the emphasis is on the behaviours and actions of the users, and not the consequences of these behaviours. – Version of the resource: released, field study setting, sampled users: real users, real sampled tasks, what is observed: extend and nature of resource use. Impact on user knowledge, real decisions, real actions.

Problem impact study, aspect: resource effect and impact, broad study question: does the resource have a positive impact on the original problem? Stakeholders: the universe of stakeholders. – Whether the original problem or need that motivated creation or deployment of the information resource have been addressed in a satisfactory way. – Version of the resource: released,  field study setting, sampled users: real users, real sampled tasks, what is observed: care processes, costs, team functions, cost effectiveness.

Opinion appraisal

Chapter 4:

There is no “gold standard” in practice. Given two or more scenarios of professional practice, independent observers who are measuring the quality of this practice will not be in perfect agreement as to which scenario represents better practice. P.99

He believes that “gold standards even if unattainable, are worth approximating”. P.100

Demonstration studies p.102:

They aim to say something meaningful about a biomedical information resource or answer some other question of substantive interest in informatics. Demonstration studies are concerned with determining the actual magnitude of that attribute in a group of objects, or determining if groups of objects differ in the magnitude of that attribute.

  • The object of measurement in a measurement study is typically referred to as a a subject or participant in a demonstration study.
  • An attribute in a measurement study is typically referred to as a variable in a demonstration study.

Investigators conduct correlational studies that explore the hypothesized relationships among a set of variable the researcher measures but does not manipulate in any way. Correlational studies are guided by the researchers’ hypothesis, which direct the choice of variable included in the study. P.105

  1. 108 three stages of demonstration studies planning:


Chapter 5:

Reliability of a measurement, in classical theory, is the degree to which measurement is consistent or reproductive. A measurement that is reasonably reliable, is measuring something. Validity is the degree to which that something is what the investigator wants to measure. P.114

The internal consistency approach with co-occurring observations is the proposed method for estimating reliability that has human subjects. The best estimate of the true value of the attribute, regardless the method we use, for each object is the average of the independent observations. P.119

Any measurement process consisting of multiple observations can reveal the magnitude of its own reliability. P.120

Increasing the number of observations typically increases the magnitude of the estimated reliability. P.125

For a measurement study design where an almost unlimited number of objects is available, at least 100 objects should be employed for stable estimates of reliability. P.126

In correlation demonstration studies, we are interested in the correlations between two attributes in a single sample. P. 128

Reliability estimates indicate the degree of random, unsystematic “noise” in a measurement. The validity indicates the degree of “misdirection” in the measurement. To the extent that a measurement is reliable, the results have meaning. To the extent that a measurement is reliable, the results have what the investigator believes them to mean.  P.130

Content validity: known as “face validity”, driving question: does it look valid? Strengths: relatively easy to use; weaknesses: makes a weak (subjective) case for validity. P.132

Criterion-related validity: known as “predictive validation or concurrent validation”, driving question: does it correlate with an external standard? Strengths: can be compelling, when a standard exists; weakness: standards may not exist or may be controversial. P.133

Construct validation: known as “convergent validation, divergent or discriminant validation”, driving question: does it reproduce a hypothesized pattern of correlations with other measures? Strengths: makes the strongest case for validity; weakness: requires much additional data collection. P.134

In general, concern with validity increases when the attributes are more abstract and only indirectly observable, but this concern never vanishes. In the realm of human behaviour and its associated attributes; when attributes and other states of mind such as “satisfaction” become the focus of measurement, the need for formal measurement studies that address validity becomes self-evident. P.135

Chapter 6:

Steps to conduct a measurement study: p.146-147

  1. Design the measurement process to be studied.
  2. Decide from which hypothetical population the object in the measurement study will be sampled.
  3. Decide how many objects and how many independent observations will be included in the measurement study.
  4. Collect data using the measurement procedures as designed and any additional data that may be used to explore validity.
  5. Analyse the objects-by-observations matrix to estimate reliability.
  6. Conduct any content, criterion-related or construct validity studies that are part of the measurement study.
  7. If the reliability or validity proves to be too low, attempt to diagnose the problem.
  8. Decide whether the results of the measurement study are sufficiently favourable to proceed directly to a demonstration study, or if a repeat of the measurement study, with revised measurement procedures, is needed.

Steps to improve measurement: p.148

  • Modify the number of independent observations in the measurement process (Affects reliability)
  • Modify in more substantive ways the mechanics of the measurement.

Each aspect of the measurement process that is purposefully explored in a measurement study is called a facet. Each facet has a number of levels corresponding to the number of independent observations it contributes to the measurement process. P.156

The more similar the tasks that comprise a set, the higher are the performance intercorrelations between them and thus the higher the reliability of a measurement process comprising a given number of tasks. In general, the choice of the test set of tasks should follow logically from the ultimate purposes of the demonstration study anticipated by the investigator. P.163 For studies of information resource performance (where one or more resources themselves are the objects of measurement), it is difficult to give an analogous figure for the required number of tasks, because few measurement studies have been performed. P.164

Chapter 7:

Objectivist demonstration studies can be divided in three kinds: descriptive, correlational and comparative. P.190-191

  • Participants are the entities about whom data are collected.
  • Variables are specific characteristics of participants that are purposefully measured by the investigator, or are self-evident properties of the participants that do not require measurement.
  • Levels of variables: a categorical variable can be said to have a discrete set of levels corresponding to each of the measured values to the variable can have.
  • Dependent variables: the dependent variables are a subset of the variables in the study that capture the outcomes of interest to the investigator.
  • Independent variables are those included in a study to explain the measured values of the depended variables.

The participants or resource users selected for demonstration studies must resemble those to whom the evaluator and others responsible for the study wish to apply the results. For example, when attempting to quantify the likely impact of an information resource on clinicians at large, there is no point in studying its effects on the clinicians who helped develop it, or even built it, as they likely to be technology enthusiasts and more familiar with the resource than average practitioners. P.195

A common bias in the selection of participants is the use of volunteers. P.196 They considerably reduce the generality of findings.

External validity means that the conclusions can be generalised from the specific setting, participants and intervention studies to the broader range of settings others encounter. P.207-208

Study power is an important consideration in study design as it is closely linked to the number of participants needed. A study with insufficient participants is unlikely to detect the minimum worthwhile difference between the groups, and will make poor use of the participant time and the investigators’ resources. P.209

Assessment bias: we need to ensure that no one involved in a demonstrating study can allow his or her own feelings and beliefs about an information resource whether positive or negative to affect the results. P.210

Allocation and recruitment bias: in clinica studies, investigators may, subconsciously perhaps, but still systematically allocate easier (or more difficult) cases to the information resource group (allocation bias), or they may avoid recruiting easy (or difficult) cases to the study if they know in advance that the next patient will be allocated to the control group (recruitment bias). P.211

The Hawthorne effect- the tendency for humans to improve their performance if they know it is being studied- was discovered by psychologists measuring the effect of ambient lighting on workers’ productivity at the Hawthorne factory in Chicago. P.211

The checklist effect is the improvement observed in performance due to more complete and better-structured data collection about a case or problem when paper or computer based forms are used. P.212

Chapter 8:

Receiver operating characteristic (ROC) analysis is a techniques commonly used in biomedical informatics. ROC analysis allows the investigator to explore the relationship of two variables, one dependent and discrete and the other independent and continuous, in the study across a range of choices of threshold. However, if a suboptimal threshold is chosen, the information resource’s accuracy may be lower than can actually be attained and in practice the resource will be less useful than it can be. Thus ROC curve is a useful tool to assess variation in the usefulness of the resource’s advice as an internal threshold is adjusted. P231

A complete factorial study is one of the designs that can be used to explore the effects of one or more independent variables on the dependent variable. Factorial means that each group of participants is exposed to a unique set of conditions where each condition is a specified combination of the levels of each independent variable. Complete means that all possible conditions are included in the design. Factorial designs may work well in laboratory settings, but investigators conducting field studies may find these designs unsuited to their needs because the real world present situations where the independent variables are hierarchically related. This situation typically occurs when participants in a study are part of groups inherent to the setting in which the study is conducted and thus cannot be disaggregated; thus we use a nested (hierarchical) design. P.232-234.

Regression analysis is better matched to the logic of correlational studies where all variables tend to be continuous. P.243

The formal book reference is:

Friedman, C. P., & Wyatt, J. C. (2005). Evaluation methods in biomedical informatics  (2nd ed.). New York, NY: Springer-Verlag New York.