Here are four articles I’ve been reading around data sharing, for my covid-19 article.

Data sharing and reanalysis of randomized controlled trials in leading biomedical journals with a full data sharing policy: survey of studies published in The BMJ and PLOS Medicine

Florian Naudet, Charlotte Sakarovitch, Perrine Janiaud, Ioana Cristea, visiting scholar, Daniele Fanelli, David Moher, John P A Ioannidis DOI:

  • When the study could reproduce a result, conclusions were generally the same, which seemed nice, until they point out in the discussion that they usually start from pre-processed data, whereas if you start from 100% raw data, the results might be different.
    • Example: my covid-19 data sharing study is all interviews. I “code” my interviews - saying something like “this sentence represents a desire to balance ethical datasharing with patient privacy.” - and I can publish my codes safely, but not the raw interview transcripts. People reproducing my work might reproduce similar results from the codes, but might produce different results if they had the raw transcripts and had to do the coding themselves. (That’s why I get a second coder to review my codes and produce an inter-coder reliability rating.)
  • DMPs are good, but may not be sufficient to facilitate actual data sharing (rather than plans to share…) - people don’t necessarily follow up on their DMP plans.
  • Reproducing analyses can be challenging, as data are rarely homogenous and require cleaning, analysis, standard use.

Open-access policy and data-sharing practice in UK academia

Yimei Zhu,DOI:

  • 1/5 uk respondents have shared data (given that medical and social scientists have good reasons to keep data private in some cases - and possibly others? That doesn’t worry me so much), but interestingly, more people have re-used data than shared it…
  • Interestingly, I’d have expected younger researchers to have been more likely to share data, but fewer have. Maybe it’s because more senior researchers are more likely to have had the chance to do so since they’ve been working longer?
  • I think it’s fair to say here that experience making publications OA or using open data may make people more likely to make their work open later on. That is, familiarity with some open practices can result in expanding to other open research practices?

Factors influencing the data sharing behavior of researchers in sociology and political science

DOI: Wolfgang Zenk-Möltgen, Esra Akdeniz, Alexia Katsanidou, Verena Naßhoven, Ebru Balaban

A classic case of an article about open research being paywalled 😵

So, this article was nice because it’s an approach to social sciences, which seems under-studied around openness compared to many other domains. Social studies often need to keep their data closed due to confidentiality concerns (this is very much the case with one of my current studies, for example - people won’t reveal vulnerable things to me if they think I’ll share them around the web!) - political science has less of this concern. They compare between the two domains a bit, which doesn’t really interest me too much - I’m interested in the broader strokes.

It re-confirms a lot of the same things the other data-sharing articles I’ve been reading have said, namely:

  • almost everyone thinks datasharing is good, but
  • effort of datasharing (preparing data to be shared) and risks (misinterpretation, risk of scoop, lack of incentive) often outweigh an individual’s reasons to share data.
  • It takes time and effort to prepare data for sharing. Data availability statements are often not true - that is “you can access data via…. x mechanism” - x often doesn’t actually work when ppl try to get data.
  • where people have infrastructure, they’re more likely to share. Biomedical domains in particular tend to have infrastructure for data sharing, humanities and social often doesn’t.
  • authors who share at some point tend to share things again later.
  • making it easy to share is important if you want ppl to do it.

Patient privacy in the COVID-19 era: Data access, transparency, rights, regulation and the case for retaining the status quo

DOI: Joan Henderson

Has useful notes on balancing privacy vs transparency.

Sudden and rapid changes to services […] were introduces well ahead of any considered legal protections for patient privacy and governance of these processes.

Is there an argument for the public good needing privacy overridden? This article presents arguments and ultimately concludes that the benefits probably do not outweigh the downsides. Phew. A large part of this is also around the fact that commercial interests may benefit from this without giving back in any way.