What is the optimum feedback scale?

This guide is written in response to the many queries received from both L&D and HR professionals who use data to help manage their talent retention and development strategies.

It is impossible to recommend a single perfect response scale, as the optimum scale should meet the specific needs of the survey in question. When defining a survey scale to meet specific requirements there are a number of things to consider e.g.

whether the scale is odd or even numbered
number of points on the scale
labelling of the points
general reliability and validity of the scale.

There is much debate as to whether odd or even numbered scales are the most effective. Odd numbered scales are generally regarded as allowing for a ‘neutral’ option such as ‘neither agree or disagree’. Supporters of the neutral point argue that giving a ‘don’t know’ option ensures that respondents do not manufacture opinions instantaneously.

However, advocates of even numbered scales argue that in reality people are never neutral on issues and always have an opinion, even if they had not previously conceived of it. Moser and Kalton in their book “Survey Methods in Social Investigation” argue that ‘there is clearly a risk in suggesting a non-committal answer to the respondent,’ as they believe that a mid-point allows respondents to ‘opt out’ which in turn provides uninformative data. The advantage of a scale which forces a view is that when it is pooled with all other responses it provides a much needed benchmark.

The challenge with ‘forced choice’ scales is that they tend to positively skew the overall results. In a forced choice situation respondents prefer to ‘be nice,’ rating items positively rather than negatively, i.e. choosing ‘somewhat agree’ rather than ‘somewhat disagree.’ It is possible however to dispel this ‘Halo effect’ (where respondents with an overall feeling of like or dislike give high or low ratings to all features) by alternating the direction of successive ratings. There is a second method of reducing positive bias by wording points in a positive way so that criticism seems less negative.

Of equal importance as the problem of positive response bias is the ethical concern forced choice responses raise. Researchers have contended that it is somewhat unethical to force responses in such a way. Rugg and Cantril (https://psycnet.apa.org/record/1943-00907-001 ) argued for the middle alternative ‘in that it provides for an additional graduation of opinion.’ Indeed surveys with a neutral option have been found to merit a higher response rate, possibly indicating that respondents feel more comfortable using them. Ultimately, whether even or odd number scales are used depends on the requirements of the research, for example if it is necessary to delineate between satisfied and dissatisfied customers then an even number scale would be more suitable as it would show whether responses are largely positive or negative.

In addition to the above debate, there is also a methodological problem concerning the central point in a Likert-type scale (https://en.wikipedia.org/wiki/Likert_scale) in that it can be ambiguous. It may imply a neutral position, i.e. that the respondent has no opinion, or it may be that the respondent is torn between feelings in both directions. Partly as a consequence of this overall scores central to the distribution are quite ambiguous. Central scores could be composed of a large number of ‘undecided’ answers or they could be a culmination of ‘strongly for’ and ‘strongly against’ answers. Also, analysis of results, which include a mid-point, may not expose the fact that respondents were answering in a ‘devil may care’ fashion.

A second decision to make in defining an optimal response scale is the number of points to use. The number of response options affects the scales’ reliability (the ability to provide the same feedback regardless which sample of the population you select) and discriminability (‘the ability to discriminate between degrees of the respondents’ perceptions of an item’). Cohen (1983) concluded that a minimum of three points is necessary whilst a maximum of nine points can be used effectively (Bass, Cascio & O’Connor https://psycnet.apa.org/record/1974-32365-001).

Ten point (or more) scales tend to be employed less frequently as it is usually difficult to make distinctions finer than a 10-point scale requires, notwithstanding the fact that the larger the number of choices offered the more complicated it is for respondents to utilise. Although a higher number of points may seem to gather more discriminating data, there is some debate as to whether respondents actually discriminate carefully enough to make these scales valuable. Overall, the extreme categories are found to be under-used. It is, however, possible to counteract this by making end points sound less extreme or (particularly for a 10-point scale) by pooling responses from a group of end categories.

It is common to find that 10-point scales are condensed into three or five point scales for reporting purposes. Thus it would seem simpler to utilise four or five point scales, especially as a score of five fits neatly with the five statements on the semantic scale which ranges from ‘very good’ to ‘very poor,’ it yields a good distribution of response and enables researchers to easily pick out differences in opinion.’ On the other end of the spectrum, two and three point scales have little discriminative value and are therefore rarely recommended for satisfaction research.

Chang’s (https://journals.sagepub.com/doi/abs/10.1177/014662169401800302?journal) overview of previous research discovered that various conclusions had been drawn concerning the issue of reliability: that reliability is independent of the number of points on the scale, that it is maximised by a 7-... or 5-... or 4-... or 3-point scale! In terms of reliability Chang argues that there are two issues to consider: respondent knowledge of the subject and the similarity of their frame of reference. He believes that the higher the number of response options available, the greater the likelihood of error is as respondents’ frames of reference are likely to differ on the various meanings of the points. There are almost always problems of defining the end points of scales relating to, for instance, ‘honesty’ as different respondents may use different frames of reference unless they are informed of the purpose of the rating procedure. Similarly, users of 360 degree feedback will have different views of what constitutes ‘excellent’ behaviour depending on the individual’s role within the organisation. In the same way, an Olympic Athlete will have a better idea of using stretching performance targets than most employees because Olympic Performance is dependent upon being the best.

It is often the case in 360-degree and employee feedback that respondents do not have the necessary information to comment on other people’s behaviour, so a ‘not able to rate/insufficient evidence’ category must be included. Chang suggests that if respondents lack knowledge about the subject being surveyed then they will overuse the end points of a longer scale.

Research into a 3-point scale (consisting of ‘strength,’ ‘adequate,’ and ‘development needed’) has been undertaken at GFB. Feedback from respondents indicated that they felt heavily constrained with only three categories to choose from and following the positive response bias, were reluctant to use ‘development needed’ on more than a few occasions. The 3-point scale was employed on a questionnaire measuring 5 competences with 7 questions for each competence. Results showed a high degree of positive skew, 65% of responses were ‘adequate,’ 25% were ‘strength’ whilst only 10% of responses were ‘development needed.’ Clearly this exemplifies the power of label descriptions, the tendency to respond positively in this instance causes the 3-point scale to be an inadequate measuring tool.

Further research at GFB has shown that highly detailed descriptions of response options are more effective than more indiscriminate ones. For example, a scale such as:

5 Consistently exhibits exceptional behaviour. Is an inspiration to colleagues.

4 Always exhibits behaviour and is at times exceptional

3 Almost always exhibits behaviour with an effective outcome

2 Sometimes exhibits behaviour effectively — development would improve consistency of the behaviour

1 Rarely/never exhibits behaviour — significant development required

n/a (not able to rate) is more useful to respondents than 1- True, 2 - Inclined to be True, 3 - Inclined to be false or 4 - False.

The labels of each point can influence the reliability and discriminability of a scale. For example, definitions can be written to offer more positive than negative options resulting in skewed data, so care must be taken to avoid bias of this kind. Problems which arise from differing frames of reference can be reduced by using only two labels to anchor the end points, resulting in a nominal scale with an equal number of intervals (represented by digits) between labelled end points. While this may in fact be the case, it of greater importance to ensure that respondents understand the meaning of each point in the scale which seems only possible if each point in the scale is labelled. Not only does this avoid misinterpretations of numerical points but it also allows the report to be written in concrete pre-determined terms.

Labels affect the validity of the survey regardless of the number of labels employed. In order to determine the validity of a scale (whether or not the questions are relevant to what is being tested and the objectives of your survey) the survey must be tested and re-tested. A well-established scale that has been in use for many years will provide more valid data. Thus when defining an optimum response scale it is important to consider scales which are frequently used. The Mayflower organisation, which regularly implements surveys, has recommended four five-point scales found to be especially effective:

1) Far too much, too much, about right, too little, far too little.

2) Much higher, higher, about the same, lower, much lower.

3) One of the best, above average, average, below average, one of the worst.

4) Very good, good, fair, poor, very poor.

In ‘Choosing the Right Scale’ Pearson Inc. recommend two scales for the measurement of requirement and expectation (used in Mail surveys). The four-point requirement scale is as follows:

Exceeded	Met	Nearly Met	Missed
4	3	2	1

This is recommended for reliability and discriminability and is particularly suitable for unsatisfied respondents who prefer to utilise more positive terms such as the ‘nearly met’ response. Similarly reliable and discriminate is the five-point expectations scale:

Significantly Above	Above	Met	Below	Significantly Below
5	4	3	2	1

In determining which scale is best for 360-degree and employee feedback surveys the important issues to consider are

Reliability: Lissitz and Green (1975) suggest that reliability starts to level off after 5 points therefore Likert type scales are the most reliable.
Discriminability, again Likert type scales are highly recommended as they offer enough information to discriminate between participants’ differing viewpoints.
Validity: evidence has shown that the most valid scales are those that have been employed effectively for some time.
The even vs. odd numbered scales debate: evidence indicates that ‘forced choice’ scales are really only suitable in certain circumstances (such as customer satisfaction surveys). It important to have an equal number of positive and negative points to choose from as well as a neutral option for respondents to select if they do not feel that they can make an informed decision.
Labelling: all points should be labelled to avoid confusion and minimise error due to differing frames of reference. Also the power of labels must not be underestimated; concise detail and positive wording (i.e. ‘nearly met’ instead of ‘poor’) are tantamount to the success of the survey.

Finally, although a 5-point Likert type scale seems the optimum response scale for 360-degree and employee feedback, whatever scale is employed must relate to what is being surveyed and be as free from bias as possible. Only when all of these points have been considered will the end result be the best response scale for your survey.

Please email elise.cope@gfbgroup.com if you would like us to send you our Guide to 360 Surveys or if you would like to set up a call.

https://www.gfbgroup.com/360-degree-feedback