Prof John Mingers |
The research excellence framework – or REF – has become such an established part of the higher education landscape that it’s hard to imagine a world without it. Since 1985 it has provided the navigation points in a turbulent and changing landscape, marking progress and, like a modern day Lachesis, measuring the threads of individual research in readiness for Atropos’ shears.
As we gear up for REF2021 and wait to hear from HEFCE about what form it will take, it seems an appropriate time to take stock and question whether the game is worth the candle, and if the measurement of research should be undertaken differently.
This is a perennial question. It maps onto the tide times of the REF itself. As the REF flows towards high tide, and a disproportionate amount of academic and administrative time is taken up with judging outputs and deciding on staff to be submitted, more voices are raised in protest. As it ebbs and people return to their day jobs, the protest dies down.
Before REF2014 Derek Sayer of Lancaster University questioned the whole framework of the REF in his book Rank Hypocrisies, and even went so far as appealing against his inclusion in Lancaster’s submission. Similarly Dorothy Bishop of Oxford looked at alternatives, suggesting that the use of bibliometrics, which was rejected before the last exercise, should be reexamined.
At the heart of their argument is that peer review is not fit for purpose. The sheer scale of the exercise does not allow for an informed assessment of outputs. ‘The REF,’ wrote Sayers, ‘is a system in which overburdened assessors assign vaguely defined grades in fields that are frequently not their own while (within many panels) ignoring all external indicators of the academic influence of the publications they are appraising, then shred all records of their deliberations.’
Whilst many might concur with this, most see no alternative. And yet peer review is a relative infant in the world of academia. I’ve written before about the surprisingly short history of this apparent gold standard. It’s current prevalence and dominance is the result, essentially, of a confluence of the baby boomers coming of age and the photocopier becoming readily available. As such, the term ‘peer review’ was only coined in 1969.
A couple of weeks ago the Kent Business School convened a ‘debate’ to reopen the wound. ‘The Future of Research Assessment’ heard from both sceptics and believers, from those who were involved in the last REF and those who questioned it, and looked again at the potential worth of Dorothy Bishop’s bibliometric solution.
Prof John Mingers began by laying his cards on the table. ‘Debate is the cornerstone of academic progress but there is not enough of it when it comes to peer review and possible alternatives,’ he said.
Taking aim at the REF, he suggested that, whilst it was intended to evaluate and improve research, it actually has the opposite effect. It neither effectively evaluates research, and can have ‘disastrous’ effects on the research culture. For him, the REF peer review was fundamentally subjective and open to conscious and unconscious biases. The process by which panel members were appointed was secretive and opaque, and the final membership came to represent an ‘established pecking order’, and may not have the necessary expertise to properly evaluate all the areas of submission.
Moreover, even accepting the good faith of the members in assessing outputs objectively and fairly, the sheer workload (in the order of 600 papers) meant that ‘whatever the rhetoric, the evaluation of an individual paper was sketchy.’
For Mingers, the exercise wasn’t fit for purpose, and didn’t justify the huge expense and time involved. Instead, he suggested that a nuanced analysis of citations offered a simpler, far cheaper and more immediate solution. ‘After all, citations are ‘peer review by the world’.
He accepted the potential problems inherent in them – such as disciplinary limitations – but suggested these could be allowed for through normalisation. In return, they offer an objectivity that peer review can never hope to achieve, however well intentioned.
Mingers compared the REF rankings to an analysis he had done using the data underlying Google Scholar. The REF table produced an odd result, with the Institute of Cancer Research topping the ranking, Cardiff coming seventh, and a solid research institution such as Brunel sliding far below its natural position.
Much of this is a result of the game-playing that the Stern Review sought to remedy. However, whatever solution is finally agreed by HEFCE, such a peer review based system will always drive behaviour in a negative way.
His alternative used Google Scholar institutional level data, focussing on the top 50 academics by total citations. He examined a variety of metrics – mean cites, median cites, and h index – before selecting median five year cites as the measure which offered the most accurate reflection of actual research excellence.
The resulting table is a less surprising one than that that resulted from the REF. Oxford and Cambridge are at the top, followed closely by Imperial and UCL, with Cardiff down to 34 and Brunel up to 33. Interestingly, LSE – a social science and humanities institution that hosts disciplines which don’t traditionally deal in citations, came 12th.
It was an interesting exercise, and seemed plausible. Whilst The Metric Tide had cautioned against over reliance on metrics, it did suggest that there should be ‘a more sophisticated and nuanced approach to the contribution and limitations of quantitative indicators.’ Bishop and Mingers seem to provide such sophistication and nuance, and demonstrate that a careful, normalised analysis of bibliometrics can produce an equally robust and accurate result for a fraction of the cost – and agony – of the REF.