A new era for open source! Free, open and EASILY REUSABLE software for Academia

This is a quick post to celebrate what some may feel is just a ‘small victory’, but in fact this small battle won is the first in the larger War of sustainable software! Let me explain: last night, at about midnight (Melbourne time) @spetnatz got the first ‘public image’ of MyTardis working in NeCTAR’s OpenStack cloud. <– That sentence might have meant nothing to you, but let me explain human terms why this could mean more reusable software for us all.

What problem did these small victory tweets *really* solve? In a sentence, they announced that “the final barrier of reusing open source software has been removed”.   Academia projects worldwide produce software: small and big projects produce code that meet the needs of different Academic users (and often those users are a very small group of scientists scattered around the globe with a very particular problem); the problem isn’t if the software actually meets their needs (most of the time it does[1]).  The problem is that these small (or large) groups of Academics can’t get the software to work (despite it sitting in an open source code repository).

The scenario goes something like this:

An Researcher at a conference: “Oh wow, that looks like cool software that would be perfect for  __[my niche subject or topic area]__.” The Academic goes back to their home institution very excited that such a specific piece of software exists, and finds the nearest developer  to have a look at said cool software and see if they can get their own version working.

The developer finding that the software is in an open source repository like Code.Google or Github (thanks to funder mandates) attempts to get the right ‘environment variables‘ in place to get the software to work, e.g. Operating System, Database, right version of the Java <ugh!>, etc.  It is here, in trying to get the “environment variables” (aka dependencies) to work that presents the single most significant hurdle to reusing more open source software.

The problem is that most project in Academia run the project marathon but then don’t run the last mile; that is, they get the code to work on their own box but they don’t take the time to allow anyone to get it working with the click of a button.  Having more *easily reusable* software where the developer (or even better, the researcher) doesn’t have to go and figure out what dependencies in the form of ‘environment variables’ is essential if the project wants its software reused by others.  Despite trying and trying, the failure rate for getting someone else’s OSS code to work is significant [2]<– I used to think this was just me back in the day, until I conducted the following experiment where I had the lead developers of each publications repository systems (ePrints, DSpace & FedoraCommons) try and launch each others repositories: they all failed despite trying to help one another for over two hours.

The point is simple, funders (ANDS, NeCTAR, RDSI, JISC, EU, Mellon, NSF, etc) have done a fantastic job with making sure that projects who produce software MUST publish it as Open Source, we now must run the final mile of RE-USABILITY and REQUIRE that the software is not only Open Source, but that a Virtual Machine Image is made * publicly* available via the likes of OpenStack and/or Amazon (or both since they use the same APIs!) so that anyone can EASILY RE-USE it.

To take it one step further (and remove another small but significant hurdle), funders need to provide OpenStack cloud platforms that the projects can then leave their VMs with for the long term after their funding ends.  This is where software will start to become long term reusable infrastructure!!! <– this has the potential to not only solve the re-usability problem but also the SUSTAINABILITY problem.   Quite simply, we could start to address the sustainability problem if the funder dedicated themselves to making sure that they not only keep the ‘Virtual Machine’ in the long term for all their their funded projects, but then make sure to keep the ‘Environment Variables’ around so we can find a machine in the Cloud which can ‘spin it up’ for actual FREE REUSE (one of the biggest problems with Academia project is that they are just ahead of their time and need to wait for the users to catch up!).

In summary,  I am a massive advocate and participant in the Open Source movement (especially in Academia, which is where its true home is), but we have had a major flaw over the years: which is, that we have not ran the final mile of the marathon!  We produce the code and getting it working once (throwing it into a code repository), but we don’t then put it into a Virtual Machine that will guarantee that *anyone* can come along and launch the thing without having to be a developer (this is the problem that is the ‘Open Source Code Repository’).

Furthermore, we must make sure to remove the hurdle of having the need for a credit card to launch the software on the Cloud, e.g. Amazon, et al.  By making sure government funds OpenStack instances for Academia we get the guarantee of software sustainability Beyond Life Of Project (BLOP!).  And better yet we get the guarantee that it will be free for the individual (poor) Academic to have a go (notice, I’m just saying ‘have a go’, as I completely support the idea that if the software needs to be used by thousands then it should be moved over to the likes of Amazon because they can scale better than an Academic Cloud can, right now).  Which of course, now that we have solved the Open Source “EASILY REUSABLE” problem we need to look forward to the next big challenge – A Global Academic Cloud (“Mind the GAC”) <– you heard it here first 😉

Well done Steve, you rock (as usual).  Congratulation on being the first FREE REUSABLE OPEN SOURCE software system in the Cloud (Worldwide from the looks of it <– that is pretty damn cool)!!!  It is an achievement I’ve been waiting for almost five years now since we did the #Fedorazon project.  Great work.

Also, the unsung technical support heroes who worked with @spetnatz late into the night to make it happen: Clint, Sean and Steve M. <– You guys rock and it is a privelage to see watch what you are trying to achieve.  Not least, thanks to Glen being bold and shielding us hackers from the political hurdles that could quickly stop all this bottom-up innovaiton occurring.

Also a quick disclaimer: be patient with OpenStack right now, it is one of the biggest open source projects in the world and so there are lots of bear traps waiting to clamp onto your brain and drag you down.  Ask for help, as we need to do this together in Academia, we are ALL responsible for making the Cloud work (this isn’t just NeCTAR’s job), it is something we all want as developers, so lets make it happen and roll with the punches.

If you want to get trained up on how to use OpenStack check out Developer Dojos we are putting on #nadojo

http://nectar-ands.eventbrite.com/

[1]= By saying “most” projects meet the user need, I’m being a little bit ‘tongue in cheek’, there is still a very real problem around how projects build in usability testing methods to assure that is does actually meet their user needs.

[2]= I’ve personally looked over 200+ Academic project code repositories over the past five years and I can say that (roughly) only >20% of them have I got working by personally compiling the code.  In contacting the developer (if they are still around) I can slightly jump that number to 30-40%.  Otherwise, the biggest hurdle for re-use is quite simply just getting the code to compile and spin-up.

~ by dfflanders on June 7, 2012.

7 Responses to “A new era for open source! Free, open and EASILY REUSABLE software for Academia”

  1. This is a brilliant idea, my only concern is that it adds to the overheads of keeping OSS available.

  2. Congratulations on a very positive step forwards. However, I think it’s only the first step.

    Making binaries available enables reuse in an environment for which the original software is well suited and has no bugs. Hmmm…

    The power of open source is not in the provision of binaries. Freeware does that, but freeware hasn’t taken off like open source has, why?

    Open source is about the availability of the source code. It’s about the ability to fix that bug, add that feature, optimise that routine etc.

    Providing binaries is a convenience to end-users. This is very important and I congratulate the team on making this happen. I fully agree that far too many people don’t focus on this. However, it doesn’t solve the problem of an end-user saying to their friendly IT team “please fix this bug.”

    So, yes, make binaries available to ease the end-users engagement with the software. Now this is done make sure that a developer can get it up and running just as easily.

    As a case study when we took Wookie from an EU funded project into the Apache Incubator the first thing we did was make sure that a new developer could be up and running with the software withing 10 minutes. This is a combination of clear documentation on how to set up the environment and effective build tools that will work on a new install.

    So I put it to you, can I get MyTardis running from source in 10 minutes on a clean VM? If not then I suggest this should be your next goal because making those binaries available could well result in an influx of end-users which is likely to resuit in an influx of potential contributors – but only if they can get it running quickly.

  3. Interesting…

    Sharing machine images is one way forward here, and is certainly something we are trying to encourage on our Education Cloud but don’t forget that such things need quite a lot of ongoing maintenance, not just of the application itself but of the underlying stack of stuff it is built on (including the OS).

    An alternative (and possibly complementary) approach would be to make Linux ‘packages’ available (though, of course, there are a variety of formats to choose from 😦 ).

    It’s also interesting to think about whether the underlying cloud platform has to be OSS (as per OpenStack) or whether it just needs to support open standards (such as OVF)?

  4. @Ross MyTardis can go from source to build in about 10 minutes on a ‘clean VM’. It’s basically a matter of running 3 commands. The concept of ‘binaries’ is a bit moot here as even the image that currently exists checks out the latest version of its Python codebase as part of its install so technically one can fork the codebase and have their changes to the code automatically pulled down to their own instance. They can also make a pull-request to the MyTardis Github repo to have the particular change/fix incorporated into the ‘base-install’.

    @Andy The image that’s in the cloud right now will be for example/demo purposes only and will be maintained by a single entity for a single distro. This was done in the name of having something there quickly in which people can play around with / hack up. I’m aware that for actual server deployments that keeping the distro, software and its dependencies is of huge imporance. The MyTardis team are working on developing Chef configuration management ‘cookbooks’ to handle the continual automatic updates of instances deployed in the cloud. This is a priority for most of the Australian NeCTAR-funded tools and so should generate a lot of discussion in the coming months.

  5. Hi Dave
    Thanks for inviting me to “ask some tough questions” about your post.
    My initial reaction was the the language “A new era for open source! Free, open and EASILY REUSABLE software for Academia” grated. Then I remembered that although the IT community in the UK tends to be cynical of such marketing language, you are American and prone to such enthusiasm – so I forgave the rhetoric and read the most in more detail.
    Last night, reading the post on my iPod Touch, I saw the value in this work. However I am speaking as someone who is happy to use open source software if it provides value to me, but is also happy to use closed source software – I’m writing this on a Macbook Air but also use Android phone and tablet and a PC at work. My interest is in OSS which is easy to use and deploy – hence my interest in this work. However I remember several years ago when I was on the JISC OSSWatch Advisory Group predicting that Cloud services would challenge the open source vs proprietary divide (a point which Richard Stallman makes). As you know my blog is hosted on WordPress.com – running on open source software, but I have no control over the software, can’t install plugins, etc. For me, I am willing to put up with such limitations as WordPress.com does the job. But as Ross Gardler has pointed out “Open source is about the availability of the source code. It’s about the ability to fix that bug, add that feature, optimise that routine etc.”. That may be the case for developers but as a non-developer, this is of no interest to me.
    To conclude, I’m very interested in such developments – but will open source developers share such enthusiasm?

  6. Let’s say there is a series of steps to go through in the adoption of an existing open source product:

    1. Evaluation (does it look like it does what I need)
    2. Proof of concept/demonstrator
    3. Limited production
    4. Production
    5. Extension

    What cloud hosted images do is let you scream straight through steps 1 and 2 without wasting hours and days on what might be a dead end. You might even sneak through step 3, and in some cases, 4.

    As an employee of VeRSI (a sort of eResearch consultancy), I know we sometimes spend a loooong time on step 2 and 3. We need to increase our efficiency by at least 10x, and this is the kind of mechanism that will help us do that.

  7. […] The busiest day of the year was June 8th with 186 views. The most popular post that day was A new era for open source! Free, open and EASILY REUSABLE software for Academia. […]

Leave a comment