By Tim Cahill

My time with Excellence in Research for Australia lasted from 2010-2014. I joined just after the 2009 trial was wrapping up and analysis was under way on how to scale to a full exercise in 2010.

The early design of the bibliometric indicators had been done by the time I arrived, but there was still lots of ground to break – ERA was the first national metrics-driven evaluation to systematically employ bibliometric approaches. And we had great support from the likes of Jonathan AdamsLinda Butler and Jonathan Grant among many others.

But even in my time as Director of ERA, I knew that there was still a lot to learn about how to best employ metrics to inform research evaluation.

While this was before the  Metric Tide report, before DORA, I am proud to say that ERA set the standard that these documents recommend – the primacy of expert judgement, multiple converging partial indicators, no measures of individuals, transparent, reproducible metrics etc.  But, as with all disciplines, the field of bilbiometrics and research evaluation has progressed in the decade since ERA was first designed. Here are the three pieces of research that have been completed since 2009.

Citation windows and delayed recognition

The idea of cumulative advantage has been the basis of citation analysis since almost the beginning, already described by 1976, and informing much of the development of the field of bibliometrics. But there was plenty of evidence that so-called ‘sleeping beauties’ were commonplace, (here and here). Just how important this might be to citation-based research evaluation has only really been understood in more recent times.

In the first comprehensive study, in 2013, Jian Wang from iFQ  in Germany found that, based on a five year time window, “if we look at the top 10 per cent most cited papers, more than 30 per cent of the papers recognised as elite in year five will not be elite in year 31”.

In other words, the selection of short time windows for research evaluation may disadvantage a large proportion of the most important papers in a field, which only becomes apparent much later.

There are obvious limitations to the study, not least of which is the availability of reliable longitudinal data-sets and changing citation practices over time, especially since the advent of digital dissemination. However, even in shorter and more recent time-windows the general trend seems to hold.

Gender biases underpin citation metrics

While it is likely that there is no difference in citation rates between male and female authored papers, especially in developed scientific systems, there are still under-pinning biases to citation metrics that need to be considered in research evaluation. For example, high citations are linked to high productivity, and in this respect, female academics are at a well documented disadvantage. (Interestingly, it appears male authors are substantially more likely to self-cite.)

At the same time, there are clear gender biases in the peer review process that all citation metrics are essentially built upon (see here , here, andhere, among many more studies).

In other words, if there are biases in the system of peer review, it is reasonable to suspect that they flow into upstream citation metrics.

Metrics shape academics behaviours/decision making

It is no surprise research evaluation shapes researchers’ behaviours. Indeed, in Australia we had a crystal-clear cautionary example with the introduction of the research quantum in the 1990s.

More recently, though, there has been a concerted research effort to develop the empirical evidence to support this. Led by Sarah de Rijcke out of Leiden, this body of work (and here, here and here, among others) shows how the traditional focus on peer review and citation metrics limits the types of work that academic researchers are willing to undertake, creating the so called

evaluation gap” between the narrow focus on what is measured, on the one hand, and the broad missions of universities on the other. Missions, for example, like knowledge translation, community engagement, policy development etc.


…what would I do differently now, knowing what I know? In reality, very little.

The ERA metrics are still by far and away the best we have available, and the ERA evaluation process is designed around the principle of expert review. The prominence of expert review means that such updates to our understanding can be easily incorporated into the approach. All that is required is that committee members be made aware of such nuances to the indicators, and take appropriate measures to incorporate that into their decision-making i.e. exercise their expert judgement!

The same would not be true if ERA was simply a case of ‘running the metrics’ – bibliometrics in the absence of judgement is a dangerous game.

Quite some time ago, Jonathan Adams advised me to avoid giving the impression of precision of measurement when developing metrics. The three developments above show the peril with developing monolithic indicators (like h-index) and rigid, metrics-driven evaluation approaches, like simply comparing FWCI – metrics are representations, and as such need to be understood in context. They are not objective, and do not replace the unavoidable role of judgement when it comes to allocating research resources.

Dr Cahill is Director, R&D Advisory at KPMG Australia, higher education advisor.

 This essay originally appeared at his LinkedIn page. Reproduced with the author’s permission. 


to get daily updates on what's happening in the world of Australian Higher Education