Let’s begin with an example. A golfer attempts to hit a ball into a hole in as few strokes as possible. After each stroke, the golfer hopes, the ball will be nearer the hole, until (at last) on the final shot, the ball disappears with a satisfying rattle into the plastic cup liner. The goal has been met. But what does that mean? How well was the goal met? Was it met in exemplary fashion or merely in a satisfactory manner? One measure is the number of strokes required.
Assuming another player is competing, scores can be compared in order to evaluate relative performance. Without a norming score, however, goal attainment remains somewhat undefined in terms of level of achievement. Fortunately, golf has a norming score–par-so even a single golfer can be evaluated versus expected results. Golf can be taught and played in several ways. Which is the best method? How can one evaluate these methods? Perhaps comparing the performance of those who adhere to each method will provide a relative measure of which is most productive.
Evaluation of these results can help golfers and instructors make informed decisions about which method to employ. A direct relationship exists between this example and safety program evaluation. A safety program is individually measured using a variety of tools. These measures, such as injury frequency rates, can be used in comparison with normalized (or group) measures to evaluate how a program is progressing toward a prescribed level of performance. Furthermore, program activities used to improve performance can be evaluated by comparing various measures of those activities.
Hence, the evaluation process can be used to make informed decisions about safety program effectiveness. Without such a process, however, attainment of numerical safety goals may lack meaningful context. Hopkins and Antes describe traditional uses of the results of educational measurement and evaluation. “Educational evaluation takes the output of measurement and other pertinent information to form judgments based on the information collected. These judgments are the basis for decisions about students as individuals, and decisions about the effectiveness of school programs” (Hopkins and Antes 34).
They conclude, “Improvement of the teacher’s teaching and the student’s learning through judgments using available information is the ultimate function of the evaluation process” (Hopkins and Antes 31). Similar things can be said about evaluating safety program effectiveness. That is, information collected about the various activities associated with a safety program should form the basis for decisions made to improve safety performance. Evaluation is based on information collected. Data collection can be achieved via many methods.
Observation is one. Observations may be recorded or unrecorded. Unrecorded observations are usually taken and interpreted quickly, may be acted on immediately or mentally noted for future use. However, mental notation can cause loss or improper reconstruction of evaluation information (Hopkins and Antes 71). Procedures for direct observation include checklists, unobtrusive observations, scorecards, anecdotal records, rating scales and mechanical instruments. Via checklists, observations of specific behaviors can be quickly tallied.
Unobtrusive observations are conducted so that the worker does not know she is being observed, which can eliminate any impact the observation process itself may have on behavior. Scorecards are similar to checklists, but apply a weighting scheme to the behaviors being observed. Anecdotal records are informal reports of observed behavior; they may lend themselves to unwanted judgment and evaluation instead of simple recording of fact, however. Rating scales can be used to collect information about intensity or degree in relation to the observation (Hopkins and Antes 78-96).
Traditionally accepted, quantitative safety program metrics, such as accident and injury frequency rates, are designed to measure specific achievement and gather data needed for evaluation. Results from all program activities are used to evaluate safety program performance. Bottom line: Information-collection techniques must be designed to prevent an evaluator’s personal biases from influencing how results are recorded or considered. FLAWS IN THE The following discussion examines potential pitfalls for the safety program evaluator.
Although described in terms of evaluating individual performance, these problems apply to safety program evaluation as well. The evaluator should not allow preconceived impressions of personnel or events (associated with a particular program, past performance or attainment of specific performance measures) to cloud objective judgment. Safety program evaluation takes time and resources. Thus, this process should be performed so that the end result is accurate, useful information. The Halo Effect The Halo Effect is one potential evaluation pitfall.
According to Kirkpatrick, who describes the effect as it relates to the workplace and employee performance evaluation, the Halo Effect is a tendency to overrate the person being observed. This concept can be applied to safety program evaluation as well. Kirkpatrick lists seven reasons why this effect occurs.
1. A person’s past good performance leads one to expect continued good performance, and the assumption of good performance carries over to future evaluations (Effect of Past Record). 2. An evaluator tends to rate a person who is pleasing in personality and character, agreeable and otherwise compatible higher than performance may justify. . Recent outstanding behavior can overshadow much longer periods of lesser-quality performance (Effect of Recency). 4. A person with an asset deemed important by the observer, although it may be irrelevant, may receive a higherthan-justifiable rating. 5. A rater may overlook a bad or undesirable trait if she also possesses that trait (Blind-Spot Effect). 6. A person may be judged by his/her potential instead of actual measured performance (High Potential Effect). 7. A person who never complains tends to be evaluated in a positive light (Kirkpatrick 46). The Hams Effect
The Horns Effect is the reverse of the Halo Effect in that evaluations tend to be lower than deserved. Kirkpatrick offers eight causes for this effect. 1. The evaluator may have high expectations that are not easily met. 2. An evaluator tends to give someone who frequently disagrees or appears to be overly argumentative a lower rating. 3. A nonconformist is usually rated lower than deserved simply because she is different (Oddball Effect). 4. Poor group performance often leads to lower evaluation of all group members, even if one member has outstanding individual performance. . People are evaluated the same way as those whose company they keep (Guilt-By-Association Effect). 6. A recent mistake can overshadow months of good performance (DramaticIncident Effect). 7. An evaluator may associate some character trait (i. e. , aggressiveness, arrogance, passivity) with poor performance and give a lower-than-justified rate to someone who has that trait (PersonalityTrait Effect). 8. An evaluator may give a lower-thanjustified rate to a person who performs a task differently than the evaluator would (Self-Comparison Effect).
Controlling These Effects Kirkpatrick attributes these flaws to vague standards and maintains that effectively established standards of performance can reduce or eliminate their impact (Kirkpatrick 46-47). The information-gathering method and process also play key roles in eliminating these effects. As stated, anecdotal records that rely on memory can easily lead to inappropriate evaluation. Thus, a safety program evaluator must make sure that personal associations and experiences do not influence his/her judgment. Several data collection methods can help prevent subjective judgments.
For example, the critical incident method is a three-step process that involves data collection, data summary and analysis, and feedback. Developed by J. C. Flanagan, this technique uses recorded observations of specific behaviors that are judged to be critical to good or poor performance. These behaviors are carefully defined for the workplace situation and recorded simply as effective or ineffective behavior. Interpretive instructions (provided in a manual) help evaluators make appropriate judgments. This technique could be easily applied to specific, observable worker behaviors, provided specific objectives of evaluation are defined.
Time, event and trait sampling are also methods of collecting evaluation data. Time sampling involves specifically timed observations that, over time, might be expected to provide a good representation of total performance. Event sampling is like Flanagan’s critical incident method in that events deemed to represent specific performance characteristics are recorded as they are observed. Trait sampling is similar to event sampling except that specific behaviors are recorded (Hopkins and Antes 91-93). Such techniques can help ensure collection of objective data.
An evaluator’s knowledge of the various facets of the Halo and Horns effects can also help alleviate these problems. The evaluator must learn to ask whether either of these effects is influencing his/her judgment and make adjustments if necessary. An evaluator who uses objective data-collection techniques and consciously avoids making Halos/Horns judgments will produce more-accurate evaluations that, in turn, will lead to better decisions. Hopkins and Antes suggest teaching evaluators about these effects in order to improve objectivity.
The Illumination Experiments In the 1920s, a group of engineers at Western Electric examined the effect of illumination on work performance. The researchers established an experiment room and a control room, controlled various conditions and introduced changes one at a time. Much to the engineers’ dismay, the results were confounding. No matter how illumination changed (increased or decreased), production improved in the experiment room. Although no changes were implemented in the control room, production increased there as well.
These results indicated the need to record not only the details of the physical changes made, but also the physiological, medical and social changes occurring (Mayo 80). Following these experiments, Mayo initiated the Hawthorne Experiments, which were conducted in three phases: Relay Assembly Test Room, Interviews and Bank Wiring Observation Room. Relay Assembly Test Room In the Relay Assembly Test Room, various regimes of workday length, payment schemes, break length and scheduling, work week and return to non-experimental conditions were evaluated. In all cases, productivity increased from previous levels.
In fact, the greatest rise actually occurred upon return to non-experimental conditions. Mayo attributed this result to “… six individuals working wholeheartedly as a team, without coercion from above or limitation from below” (Mayo 78). Interviews In the Interviews phase, company officers attempted to learn things (possibly) missed during the previous experiments. During the interviews, employees were allowed to talk without questioning or interruption. Some 20,000 employees were interviewed over several years. The result was a feeling of well-being among employees; the interview process had been a sort of emotional release.
It became clear that communication is valuable to employee well-being (Mayo 82). Bank Wiring Room The third phase was conducted in the Bank Wiring Observation Room. Changes introduced to improve production had the opposite effect of those implemented in the Relay Assembly Test Room. Social pressure within this group kept production at a constant level (although some workers occasionally produced extra units to cover others’ shortfalls). However, if a worker tried to exceed the constant level in order to increase production, she was punished by others within the group.
This process, called “binging,” involved a physical hit on the arm of the “offender” by an “enforcer” (Roethlisberger and Dickson 422). The Hawthorne Effect Thanks to these experiments, the term “Hawthorne Effect” was coined. Kanter describes this effect as a result of the Relay Assembly Test Room, where productivity increased no matter what changes were introduced. “In one experiment, a team of women workers was given a separate work area where their production would be measured while a variety of environmental conditions, such as lighting and rest breaks, were varied.
Productivity tended to [increase] regardless of the changes that were made to physical conditions. “One conclusion was that being singled out to be in a high-visibility experiment was highly motivating in and of itself; calling this the Hawthorne Effect was, in part, a way of dismissing the claims made by new ‘human relations’ programs, arguing instead that any change involving [some] increased management attention and special treatment would have positive effects for a little while” (Kanter 409).
Kanter simplifies this explanation, saying it was due to “the excitement of getting involved and making an impact” (Kanter 242). Controlling the Hawthorne Effect The key message is that, when evaluating a safety program, one must make sure the mere process of being evaluated is not the reason a measured characteristic changes from baseline measurements. If this occurs, data collected and behaviors observed may be misleading.
Or, if, for example, several workers-are told they have been chosen to test a new safety-related process, will institution of the process itself lead to better performance, or will the workers be motivated to perform simply due to their participation in the experience? To minimize this effect, control groups should be established. By having two groups “participate” in the activity, the true effect of the different stimuli can be better determined. For example, Latham and Locke discussed an experiment through which a wood products company attempted to examine the value of goalsetting as it relates to increased production.
One work crew was selected to strive toward specific production goals, while another crew, a control group, was told the experiment was designed to assess the effect of absenteeism on production (Latham and Locke 400-401). “To control for the Hawthorne Effect, we made an equal number of visits to the control group and the training group” (Latham and Locke 401). In other words, both groups received equal attention, so both had similar reason to be motivated by participation. Result: Test group was more successful than control group.
Cite this essay
Flaws of the Hawthorne Effect. (2017, Feb 24). Retrieved from https://studymoose.com/flaws-of-the-hawthorne-effect-essay