by MERLIN CROSSLEY
I can’t remember where I learnt them, but over time people taught me two golden rules related to selection committees, be they for grants, fellowships or academic appointments.
It’s critical to rank not score: Saying I ranked person/application X second out of twenty has meaning. But saying I gave a score of B+ or 7 out of 10, isn’t helpful because no one knows if you are a tough marker or a more lenient one – the calibration problem.
Sometimes this issue is hidden, because when every member of a committee scores everyone, that solves the problem. But if not, you will get information on whether an assessor is a tough or easy marker, but no reliable information on the things you need to choose between.
There are two obvious corollaries from deciding to rank. Firstly, you can only rank effectively if you have a decent number of applications/people to rank. Secondly, ranking only really works when you are comparing like with like.
This means the first thing people do is attempt to set up good categories.
Sometimes it is easy. Obviously, research committees will compare molecular biologists with molecular biologists, rather than with historians.
Happily, one can improve things further by having equity sub-categories. So that, for instance, early career researchers, or under-represented groups, are compared with others in similar situations, rather than with established researchers from the majority status quo group.
Often, of course, it is not possible to arrive at neat categories – for example, if you are asked who should be the Time Person of the Year, you have to consider all living people and that’s pretty broad. But happily, it doesn’t matter too much if the wrong choice is made there, provided the person is above a bar of credibility.
There is always a trade-off surrounding getting the right categories. You just have to try hard and do your best.
There’s only one other rule.
No one can predict the future, but past performance is the best indicator of future success.
This means judging track records is important. One should do this taking into account performance relative to opportunity. It is also vital to acknowledge that people do grow and improve, and we need to support and encourage people to do that, so sometimes appointments are made on promise or trajectory. The key point is that ranking past performance is much easier to agree on than ranking ideas. No one can rank ideas reliably.
If you don’t believe that, you only have to think of the anecdotes about failed attempts to predict which ideas are good. I recall the gene sequencer Craig Venter showing assessor reports and his grant rejection letter that said his project was impossible. In the next slide he showed he had published the genome in Science before the grant review process had even been completed. The very existence of the stock exchange should be enough to convince people that one just cannot predict which ideas will be winners.
I’m not sure if the two golden rules above are all generally accepted but recently a colleague sent me an interesting article that covers some of these points. Herbert Marsh (Oxford), Upali Jayasinghe (UNSW) and Nigel Bond (WSU) published their paper back in 2008 in the American Psychologist. It provides impressive evidence supporting the advantages of judging research team track records rather than projects, and having a few expert reviewers ranking large number of applications, to avoid the calibration problem.
Over the years I have seen various systems move towards these rules and I think it has helped ensure that the right resource allocations can be agreed on and justified, it has reduced workloads, it has provided some stability, and helped set visible goal posts that drive the right sort of behaviours.
But surprisingly, I have also seen some organisations drift away from these two rules, and for interesting reasons.
One of the problems with having a small panel ranking a large number of people or grants is that it centralises power. This is concerning because power corrupts. In the civilised world one is seldom talking about evil power but rather the other type, unconscious favouritism that comes from people being overly and often genuinely supportive to those within their own networks, often of people like them.
But I wonder if the answer isn’t that we should throw away the concentration of power that is necessary if one is to do proper ranking, but rather to carefully control and constantly refresh the membership of panels to make the power temporary. I prefer this to fragmenting the judging between multiple small panels or assessors, because I worry about the calibration problem.
The other recent anxiety revolves around track record. I have discussed the problem that junior researchers should not be judged against established professors and that can be dealt with by having separate categories. There is also the concern that historic disadvantage will be perpetuated as some people, who may have astonishing ideas, may never get a chance to put them into practice.
This issue – the unrecognised genius – is much harder to solve and I acknowledge its importance. But I feel it is much more likely to be solved by local institutional support, by people who can see the detail, rather than by distant selection committees who have very little more than a short application to go on. I doubt many external assessors or selection committees will be able to agree on who the Van Goghs are ,or to ever rank them above more prolific and engaging Picassos.
The main problem with moving away from ranking to assessor scoring and from track records to ideas is that the calibration problems and disparate views about ideas makes things more random and that will cause a lot of start/stop research.
This will affect the type of research we do. I think of zebrafish or mouse researchers who need to maintain colonies of genetically modified animals, or others in the midst of long term field projects. There will be a drift to shorter term in vitro or lower cost theoretical work, unless, of course, institutions can provide bridging support. With the current financial crisis that just became harder.
Any increase in randomness will also hit early career researchers harder than others. Senior people often have three or four funding sources going, as well as various collaborators who can help, and are better placed to survive some volatility. Those with just one grant, who have done very well and are expecting renewal, won’t be well served if there is randomness in the system.
There will also be another side effect. Researchers will drift away from individual work towards large collaborative teams. Some people celebrate this as they see it as strategic, but it can also stifle creativity, reduce diversity of research, and it also makes it more challenging for junior researchers to emerge independently and make their mark.
While I’ve been in administration I’ve been asked for many things – more money, more space, more equipment, and even for more time. It’s rare that I have reserves available to satisfy any of these requests. But what I can try to do is to provide stability and certainty. To me this is very important, so I look across the world to try to identify the systems that are most predictable, which incentivise the right behaviours, and support the greatest productivity. I also look at our Australian systems and ask myself whether we could do better.
Another thing I’ve also learnt is that the tasks that selectors face are never easy. Players, here academics, may receive cheers, but umpires mostly only get boos. So, I’m not too critical of those who are acting in good faith as umpires. Whether you are on the selection committee for a position, a grant allocation, or whether you are responsible for choosing the Archibald or Booker Prize winner, or the Time person of the year, I applaud you and wish you luck – although luck, of course, shouldn’t really come into it, and I guess that is the message here.
The Crossley Lab appears in CMM every Friday