Things to track

  • sources of data useful to biologists and bioinformaticians
    • licence - this may need to have multiple questions, and potentially may change over time
    • attitudes and struggles
    • machine readability
  • streams of information to glean from users.
  • responses to attitudes and struggles

Things to do

  • Ethics! (need to know scope first)
  • sign up to virtual biohackathon slack and group โ˜‘๏ธ
    • contact them to ask if they mind being studied! (just the organisers) โ˜‘๏ธ
  • NF-core covid group โ˜‘๏ธ
  • RDA group
  • https://docs.google.com/document/d/1ExyphyMfvUTlPj7vZ3wvbIIEqv4nhBpyLcV0_n1H5e8/edit
  • ask (twitter?) for experiences // wait for ethics
    • slight possibility of it being really difficult to manage responses.
    • possible mitigation?
      • Form? Pro - easy to submit, Con: possible duplications - but maybe good - more reported is more popular.
      • GitHub PR? Pro- less duplication and manual work on my side, Con, may not represent the strength / magnitude of the issues. May be a tech barrier. Many bioinformaticians would have the skill to do this or willingness to figure it out.
      • maybe both. PR if you can, form if you canโ€™t. Ideally a google form or some other form of data collection that facilitates collaboration would be more effective.

Literature

  • Look up literature regarding previous Ebola and Zika outbreaks - e.g. Nick Lomanโ€™s data sharing stores.
  • Other stuff. ๐Ÿ‘ˆ๐Ÿ‘ˆ๐Ÿ‘ˆ Become less vague about that this is ๐Ÿ˜†

Research questions to answer

In times of a pandemic or epidemic when rapid response is required, what are attitudes towards pathogen-related data sharing and data access? In particular:

  • Are these data licenced in a way that permits re-use and redistribution?
  • Are they made available in ways that are easy to download and re-use, e.g. API or bulk download, machine-readable with relevant metadata?
  • What response do various communities have to these restrictions?

Scope

  • In Scope: biology and bioinformatics oriented data sources - genetic sequence data, protein data, viral strains, and statistical data relating to infections - infected, recovered, death, locations.
    • there are likely to be too many data streams to be comprehensive about monitoring them all, but we can probably find most of the data sources out there.
  • biomedical / personal data? - possibly out of scope maybe we should avoid strongly personal data, such as mobile tracking app data - there is good reason to be cautious about sharing this. Donโ€™t disregard completely - write about concerns such as anonymising safely and state surveillance, securely gathering data.
  • imaging?