My favorite talks at OKCon (Geneva)
Most of the sessions at OKCon were nice, but there was was one session that made the entire trip worthwhile for me. In short, the following is my recommendation for the people you should go and google right now.
First up was @floppy whose tag line is going to be my next t-shirt: “we need to bring the world of Open Source to Open Data” – followed up by the reminder that the Open Data world hasn’t even got a sourceforge, let alone a github ecosystem and all the tools that Open Source developers have at their beckon call to work with code. I love this analogy, as data is its own kind of code and deserves as much attention to tooling as code. Best of all @Floppy is actively working on enabling data spreadsheets to be forked via GitHub/GitLab in the same way that code is, which means all the tools that GitHub provides could be applied to tabular data as its own kind of code base. In short, watch this space as the future of data is going to be via developers playing with it in active real world communities like this.
Next up was @maxogden who is one of those developers who is creating what looks like the future of tabular data through his new file format .dat (short for data). The simple code library that @maxogden is creating does the transformations that data developers spend hours doing, e.g. geting excel spreadsheets into databases and vice versa. In short, Data Developers waste a lot of time on ‘code glue’ moving spreadsheets and small tabular databases (access/sqllite) into more powerful developer tools like MySQL or even better JSON databases like CouchDB. The DAT tool is alive and kicking and saving developers hours of their time; it scratches an itch and should be used by everyone today making .dat files the de facto mime type for data!
Finally, is my personal favorite which is Karthik @_inundata who is leading the data revolution for scientists via the #RStat tool (via his project @rOpenSci). For those of you who have not utilised R (and are in Academia) you should get involved today (or you are living in a cave!), as “R is for Research!” (see below). @_inundata’s project (funded by Sloan) over the next year is dedicated to both: a.) building an international community, and b.) building CRAN repositories which enable research experimentation and publication process, such as:
- pulling in data from phylogentic trees or
- augmenting personal data with larger datasets like the Worldbank data or
- more easily formatting publications with markdown so that LaTeX no longer need be mastered to publish a thesis (see: knitr+markdown+Rstat <– this is brilliant), or
- adding metadata to our research data and publication that easily cites your toolchain without having to go through a librarian, or
I’d like to pontificate a little more on Karthik’s work as it is something we plan to actively role out via Melbourne’s Postgraduate Programming Club – and ideally participate more in the community that @rOpenSci project is achieving!
Why R aka #RStat aka #RStudio?
R is for ‘Researcher’ IMHO, because it is a fully fledged programming language for researchers. Here are some of the things I’ve discovered about it over the past six months as I’ve gotten involved with the quickly growing community!
First off I should state that I believe every researcher from Humanities to Social Science to Physics to Mathematics will need to learn how to code on some level.
Writing code will be as important (if not more important) as knowing how to write a research paper. Here are some of the things that researchers of the future will say about why they are using R:
- We need a programming language that can be reproduced, R is for Research.
- We need a programming language that are as easy to repeat as sharing and opening up a file, R is for Research.
- We need a programming language that we are able to cite, R is for Research.
- We need a programming language that allows us to write tools for subject specific activities, R is for Research
- We need a programming language that is lab book like, R is for Research.
- We need a programming language that integrates with the publishing process, R is for Research.
- We need a programming language that allows us to easily change our mind and amend via review, R is for Research
- We need a programming language that shows research data and research code next to each other, R is for Research.
- We need a programming language whose syntax and notation is akin to academic scientific notation, R is for Research.
- We need a way to track any and all research scripts assuring that they are archived for future consideration by researcher both in that discipline and outside that discipline, R is for Research.
But, do you think R is for Research? Why or why not? <– Please either leave a comment or tweet me on @dfflanders