Tuesday, 17 December 2013

The Start of a Journey

Last week a new paper, “Policy challenge of clinical genome sequencing,” led by Caroline Wright and Helen Firth and on which I am a co-author, was published in the British Medical Journal. It lays out the challenges of making more widespread use of genetic information in clinical practice, in particular around ‘incidental findings’. Caroline and I have a joint blog on this paper on Genome Unzipped.
This paper also marks an important watershed in my own career, as it is my first paper in an outright clinical journal. Like many other genomicists and bioinformaticians I have started to interweave my work more tightly with clinical research, as the previously mainly basic research world of molecular biology begins to gravitate towards clinical practice.

Worlds apart

Clinical research and basic research are profoundly different, both in terms of scientific approach and culture. Clinical researchers who keep a hand in clinical practice are nearly always time pressured (i.e. with hospital meetings, clinics, inflexible public responsibilities) and their research has to be squeezed to fit around their practice. The language of clinical research is also distinctly different from that of genomics. For example, I used to use the word ‘doctor’ interchangeably with ‘clinician,’ until a generous clinician took me aside and patiently explained that ‘doctor’ is not the word clinicians use, as it does not capture the myriad disciplines in the healthcare system. They use the word… clinician.
But the differences run deeper than terminology and schedules. Clinical practice involves seeing patients, each of whom presents a different constellation of symptoms, tolerance to treatment and set of personal circumstances – it’s a far cry from the nice, uniform draws of statistical distributions that one hopes to see in designed experiments. A clinician has to work out the true underlying problem – often different from the one described by the patient – and find a way to make it better, often under pressure to act quickly and contain costs.
In theory, molecular measurements – from genotypes to metabolomics – should be informative and useful to the clinician. In practice, there is a wide gulf between any given molecular approach (usually from a retrospective study) and the uptake of molecular information into clinical practice.
Hanging out with more clinicians has given me a deeper appreciation about the difficulty of achieving this, and for why clinicians make such a sharp distinction between people who are part of medical practice and those who are or not. I, for one, have never had the responsibility of making a clinical decision (I’m rather glad other people have taken that on, and appreciate the amount of training and mental balance it takes), so I know I haven't  grasped all the crucial details and interactions that make up the whole process.

Different perspectives

Medicine is also quite diverse, and rightly so. A clinical geneticist might be dealing with a family with a suspected genetic disorder, but a number of family members are currently healthy. Meanwhile, a pancreatic cancer specialist might be helping a new patient whose chances of living another five years is around 2% - and who is therefore a lot more willing to look into experimental treatments than the clinical geneticist’s family.
Even within a discipline, it is not so obvious where the new molecular information is best used. I had the pleasure to be the examiner for Dr James Ware, a young clinician and PhD doing research on cardiac arrhythmias (a subset of inherited cardiac diseases) with Dr Stuart Cook. He presented excellent work on geneticially ‘dissecting’ out some new arrhythmia mutations from families. He also revealed a passion not just for using genetics but for finding practical ways to do so. From his perspective, in this particular medical area, the bigger impact for genetics would be after a phenotype-led diagnosis, rather than for diagnosis itself.

Discussions leading to insight

Our recent paper in the BMJ is a good example of how much I have learned in recent years simply by discussing things with clinicians in detail. I have long advocated a more open and collaborative approach to sharing information about variants with ‘known’ pathogenic impact, even considering the daunting complexity of variant reporting and phenotypic definition (progress is steady in this area, e.g. the LRG project), and this seemed to be aligned with the discussion about making definitive list of variants for “incidental findings”  So I was somewhat taken aback to find that many clinicians did not share my enthusiasm about incidental findings.
After a workshop organised with Helen and with strong input from Dr Caroline Wright, both passionate, open-minded clinical researchers, I fundamentally changed my mind about the utility of ‘incidental findings’ (better described as ‘opportunistic genetic screening’). For the vast majority of known variants we either have poor estimates of penetrance or – at best – estimates driven by ‘cascade screening’ in affected families (i.e., an initial case presents to a clinical geneticist, triggering exploration around the family).
While this is a really important aspect to consider, my passion about more open sharing of knowledge around specific variants remains firmly in place. Caroline, Helen and I remain positive about the growing utility of genome information in clinical research and in targeted diagnostic scenarios, but not for incidental findings until more systematic research is performed (see our ‘Genomes Unzipped’blog post).

Bridging the gulf

Working with clinicians has given me deeper insights into my own work, and in this particular instance changed my opinion. I hope that these interactions have also been positive for the clinicians, perhaps changing their minds about the utility of bioinformatics and genomics and giving a new perspective on the possibilities and pitfalls of the technology.

More broadly, the coming decade is expected to be characterised by basic researchers delving deeper into other areas of science, in particular applied science: areas of medicine, public health, epidemiology, agricultural and ecological research. This is a fascinating, if daunting, challenge for us all. New people to meet, new terminology and language to navigate, new science and applications to wrap our heads around… These are all good things, and I’m sure we will get used to it. We have to.

Friday, 13 December 2013

Making decisions: metrics and judgement

The conversation around impact factors and the assessment of research outputs, amplified by the recent 'splash' boycott by Randy Shekman, is turning my mind to a different aspect of science - and indeed society - and that is the use of metrics.

We are becoming better and better at producing metrics: more of the things we do are digitised, and by coordinating what we do more carefully we can 'instrument' our lives better. Familiar examples might be monitoring household electricity meters to improve energy consumption, analysing traffic patterns to control traffic flow, or even tracking the movement of people in stores to improve sales. 

At the workplace it's more about how many citations we have, how much grant funding we obtain, how many conferences we participate in, how much disk space we use... even how often we tweet. All these things usually have fairly 'low friction' instrumentation (with notable exceptions). 

This means there is a lot more quantitative data about us as scientists out there than ever before, particularly our 'outputs' and related citations, and mostly with an emphasis on the traditional (often maligned) Impact Factor of journals and increasingly on "altmetrics". This is only going to intensify in the future.

Data driven... to a point

At one level this is great. I'm a big believer in data-driven decisions in science, and logically this should be extended to other arenas. But on another level, metrics can be dangerous.

Four dangers of metrics

  1. Metrics are low-dimensional rankings of high-dimensional spaces;
  2. Metrics are horribly confounded and correlated;
  3. A few metrics are more easily 'gamed' than a broad array of metrics;
  4. There is a shift towards arguments that are supported by available metrics.

The tangle of multidimensional metrics

A metric, by definition, provides a single dimension on which to place people or things (in this case scientists). The big downside is that we know that science is considered "good" only after evaluating it on many levels. It can't be judged usefully along any single, linear metric. On a big-picture, strategic level, one has to consider things within the context of different disciplines. Then there is  an aspect of  'science community' - successful science needs both people who are excellent mentors and community drivers, and the 'lone cats' who tend to keep to themselves. Even at the smallest level, you have to have a diversity of thinking patterns (even within the same discipline, even with the same modus operandi) for science to be really good. It would be a disaster if scientists were too homogeneous. Metrics implicitly make an assumption of low dimensionality (in the most extreme case, of a single dimension), which by its very definition, cannot capture this multi-dimensional space.

Clearly, there are going to be a lot of factors blending into metrics, and a lot of those will be unwanted confounders and/or correlation structures that confuse the picture. Some of this is well known: for example, different subfields have very different citation rates; parents who take career breaks to raise children (the majority being women) will often have a different readout of their career through this period. Perhaps less widely considered is that institutions in less well-resourced countries do not actually have poorer access to the 'hidden' channels of meetings and workshops of science. 

Some of the correlations are hard to untangle. Currently, many good scientists like to publish in Science, Nature and Cell, and so ... judging people by their Science, Nature and Cell papers is (again, currently) an 'informative proxy'. But this confounding goes way deeper than one or two factors; rather, it is a really crazy series of things: a 'fashion' in a particular discipline, a 'momentum' effect in a particular field, attendance at certain conferences, the tweeting and blogging of papers... 

Because of the complex correlation between these factors, people can use a whole series of implicit or explicit proxies for success to get a reasonable estimation of where someone might be placed in this broad correlation structure. The harder question is: why is this scientist - or this project proposed by this scientist - in this position in the correlation structure? What happens next if we fund this project/scientist/scheme?

Gaming the system

I've observed that developing metrics, even when one is transparent about their use, encourages more narrow thinking and opens up the ability to game systems more. This gaming is done by people, communities and institutions alike, often in quite an unconscious way. So... when journal impact factors become the metric, there is a bias - across the board - to shift towards fitting the science to the journal. When paper citation numbers (or h-indexes) become the measure by which one's work is judged, communities that are generous in their authorship benefit relative to others. When 'excellent individuals' are the commodity by which departments are assessed, complex cross-holdings of individuals between institutions begin to emerge. And so on.

In some sense there is a desire to keep metrics more closed (consider NICE, who have a methodology but are deliberately fuzzy about the precise details, making it hard to game the system). But this is at complete odds with transparency and the notion of providing a level playing field. I think transparency trumps any efficiency here, and so the push has to be towards a broader array of metrics.

Making the judgement call

One unconscious aspect of using metrics is the way it affects the whole judgement process. I've seen committees - and myself sometimes when I catch myself at it - shift towards making arguments based on available metrics, rather than stepping back and saying, "These metrics are one of a number of inputs, including my own judgement of their work". 

One needs to almost read past the numbers - even if they are poor - and ask, "Is the science worth it?" In the worst case, the person or committee making that judgement call will be asked to justify the decision based entirely on metrics, in order to present a sort of watertight argument. But there are real dangers of believing - against all evidence - that metrics are adequate measures. That said, this is the counter-argument to 'using objective evidence' and 'removing establishment bias' - the very thing that using metrics helps counter. There has to balance.

So what is to be done here? I don't believe there is an easy solution. Getting rid of metrics exposes us to the risk of sticking with the people we already know and other equally bad processes. 

I would argue that:

  • We need more, not fewer, metrics, and to have a diversity of metrics presented to us when we make judgements. This might make interpretation seem more complicated, and therefore harder to judge. And that is, in many cases, correct - it is more complicated and it is hard to judge these things.
  • We need good research on metrics and confounders. At the very least this will help expose their strengths and weaknesses; even better, it will potentially make it possible to adjust for (perhaps unexpected) major influencing factors.
  • We should collectively accept that, even with a large number of somewhat un-confounded metrics, there will still be confounders we have not thought about. And even if there were perfect, unconfounded metrics, we would still have to decided which aspects of this high-dimensional space we want to select; after all, selecting just one area of 'science' is, well, not going to be good.
  • We should trust the judgement of committees, in particular when they 're-rank' against metrics. Indeed, if there is a committee whose results can be accurately predicted by its input metrics, what's the point of that grouping?


My thinking on this subject has been influenced by two great books. One is Daniel Kahneman's "Thinking, Fast and Slow", which I've blogged about previously. The other is Nate Silver's excellent "The signal and the noise". Both are seriously worth reading, for any scientist.