The question I asked The Inventor of the Web

I owe a massive thanks to Dr. Steven Manos (who in managing TBL coming to Melbourne University) gave me the honor of asking the final question at TBL’s public lecture.  I began the question by asking all the researchers in the audience to raise their hand, naturally well over half the audience did so.  I then asked Sir Tim:

“To the researchers who have just raised their hands what would be your advice to them be for sharing their research data?”

As Sir Tim’s lecture had primarily been on the ‘New Momentum for Open‘, and as TBL invented the Web while at a research organisation (CERN), I felt we might get some nice insight.

David F. Flanders asks Sir Tim, “what would be your advice to those in the audience for sharing their research data?”, Dr. Steven Manos (Director of Research IT at Melbourne Uni in the background)

The only trouble with Sir Tim’s insight is that he has obviously thought about this at a level that most of will not yet have encountered.  Accordingly, I want to take TBL’s answer, break it down (paraphrasing) it for any researcher who won’t have found his technical-literal answer easy to understand (NB I’ve also included some related quotes from the main lecture):

[paraphrasing Sir Tim Berners Lee, bold-italics are mine]

“Researchers should use the tools they are use to using in their day-to-day so they can continue to produce their datasets and resulting research as they always have done so…

…those tools should be Open Source where available (easier to adapt to open research), but regardless of, researchers must get on with their research and use the tools they want to use…

…However, in using those tools researchers should actively be working with one another in their discipline and subject communities to create other simple tools/scripts/agents [NB note TBL called these 'software shims'], that can monitor the tools & data they work with day-to-day…

…As these tools {s/w shims] monitor the work of the researcher they can also begin to collect, back-up and organise the data & research on behalf of the researcher…

…In this way, the ‘software shims’ can act as a lab assistant to the researcher making it easier for them to mange their data.  These ‘software shims’ can also then enable the data to be published with an embargo timeline, e.g. once the research has been published and the accolades been awarded the data can then better support the research by being opened up…

…the way we enable this data to be shared is for the ‘software shims’ to use a lightweight data format like RDFaLite or one of the other easy ways to make sure data is self-describing

…Key groups that need to be involved in helping manage this data must be the library (data archivists) and developer communities who can write these ‘software shims’ and then provided a trusted service that won’t release the data until the researcher is ready to do so…

…the library (as part of the University and as part of the wider global scholarly community, e.g. Arxiv for Physics) can make the data available on the Web…”

To state it lightly, TBL dropped a ‘knowledge bomb’ on the audience, giving a level of detail that demonstrated how deeply he has thought about this exact question.

For those researcher not versed in the politics of open or the technology of linked data, I’d like to quickly break down the above sentiments that TBL expressed in this Q&A session by highlighting how we are starting to do exactly what Sir Tim is suggesting here in Australia:

How is Australia making TBL’s vision for Research Data a reality?

  1. The National Research Cloud as provided by the NeCTAR Digital Infrastructure project (as lead by the first node at the University of Melbourne) provides the first real world opportunity to build ‘software shims’ over the most common software that researchers use, e.g. there is a host of tools that researchers are utilising on the nine beam-lines at the Australian Syncrotron, the ‘software shim’ built atop these beam-lines is a tool called MyTardis which does exactly what Sir Tim describes above.  Also, worth mentioning is a tool called R which is becoming increasingly used in everything from mathematics & statistics and is seeing more use in economics (QuantMod), Social Sciences and other qualitative & quantitative data collection fields.  Because this tool can now be used in the Cloud (as opposed to being downloaded onto a local laptop) it is easy for us to build a ‘software shim’ over any and all research groups utilising R. 
  2. In addition, are projects like AARNET which is starting to experiment with tools like OwnCloud which provide DropBox like functionality to researchers so that it is easy to dump their Excel files up into folder for sharing and long term archiving.  Again, the shared Cloud functionality will enable ‘software shims’ to be added atop the file folders so that these data files can be saved in the long term.
  3. Finally, are infrastructure projects like ANDS which are engaging at the political and institutional levels to assure that librarians, developers, senior research scientists and other influential budget holders are recognising the need for long term funding to assure that whenever a ‘software shim’ (aka data capture tool) is built there will be an inherent trust associated with the tool.  Again, it is essential that the researcher trust that the data will not be released before the researcher is ready to have her data as part of the scholarly record.

In short, for me The Inventor of the Web has validated the work we are striving to do as we continue to establish the digital infrastructure necessary to make TBL’s advice come true for all Australian researchers.  We just need to roll these infrastructure tools out to more developers working with researchers!  More anon, as always… :)

February 6, 2013

2 Responses to “The question I asked The Inventor of the Web”

  1. Two takeaways I took from @TBL:
    1. Get open data raised in importance to tenure track (great message given academic heavy-hitting audience)
    2. Get researchers to open their data with a SPARQL endpoint (open, describe and connect their data)/…rf

  2. As someone at the start of their research career, I think the most influential thing for me is that in the department I work in a lot of what TBL discussed has become a normalised part of doing research. I don’t feel that it’s onerous or challenging, and now that it’s a fundamental part of my research agenda to work openly I can start influencing those around me to work in the same paradigm.

