Repositories in the Cloud!

So what is it going to take to convince people that using the cloud for your repository is the best –if not most viable long term- option for your hardware architecture?

Is it that people don’t understand what “the cloud” is and how it offers true scalability and robustness?

Or perhaps, it is that you don’t trust companies like Google and Amazon Web Services?

Maybe it is even a more base question: is your repository intended for preservation or showcasing?

Well here is some more information for you, in case you really think that your local IT department is the best option for hosting your repository:


~ by dfflanders on February 15, 2008.

3 Responses to “Repositories in the Cloud!”

  1. From a listserv where the above questions were posted:

    > > Hypothetically, what would people think of a JISC cloud service (common
    > > storage layer), based off of a service model like Janet?
    > >
    > > Especially for SMIs (Small-Medium-Institutions) could this provide a much
    > > needed solution?
    > If this were an issue, would they not be using a similar service such as
    > Amazon’s S3 at present? If JANET were to offer a storage cloud like this, I
    > imagine they would be likely to just re-sell a service like they have with
    > recent new services they provide such as SSL certificates (Globalsign), SMS
    > messaging service (PageOne) or IPTV (OSTN / INUK).
    > > The situation may have changed but I don’t think that cloud providers
    > > make any guarantees about the preservation or location of data,
    > > however, cloud technology clearly offers fantastic resilience (which
    > > is why I spent the time looking at it, and asking questions).
    > Amazon S3 can do now offer European or US hosting options.
    > > Though can we trust our own local systems (especially with
    > > preservation)?
    > But a storage cloud doesn’t offer any sort of preservation service, other
    > than (hopefully) good backups!

    Yes not “preservation” in the DCC sense (which I have yet to see implemented for repositories, except for AHDS – oops there goes that example), but preservation in the sense that our library basement with the repository servers doesn’t get flooded and destroyed, “disaster management”, e.g. I think I meant to say “the cloud can provide better disaster fail-safes for the 1 and zeros”.

    > And thinking about restoring data – presumably you can’t do this? So if you
    > mess up a file for one reason or another, Amazon offers no in-built restore
    > mechanism. This isn’t a problem in itself, it just means repository software
    > will need to be rewritten to make sure it never deletes anything, but makes
    > backups of all changes as it goes.

    I think a byte mirroring service such as LOCKSS utilizing the backend of services such as S3 ReST API could be utilized here? > Amazon is also cheaper and greener!
    > Is cost really an issue? Of a typical repository rollout project, what is
    > the percentage cost of hardware? 5%, 10% ?

    At the moment yes, but if we all really think “repositories are going to be a success” then we should be forecasting on systems such as current email infrastructure (we are up to six UNIX boxes for our email system and they are replaced every four years). I’ll leave the cost on our environment aside as well.

    > Where clouds really win, is for organisations who have no hardware
    > infrastructure, and want a scalable, reliable and cost-effective system.
    > Most universities do not fall in to this category.

    Eh, well that is not the way the business world views it, more and more companies with significant “infrastructure” are giving up their “boxes” (notice I didn’t say IT staff) to be dealt with in the cloud. If our boxes can’t be outsourced what can?

    > > Is it that people don¹t understand what ³the cloud² is and how it offers
    > > true scalability and robustness?
    > Is scalability an issue at present? As an example, I think Southampton’s
    > repository (one of the biggest in the UK) uses the grand total of 18GB disk
    > space. You’d hard pressed to buy a machine with less then 10 times this
    > amount of storage. I don’t think size is a problem to many repositories yet.

    I remember a similar argument for email systems and their need to scale? Many IT departments are outsourcing their once OpenSource email servers to external hosted proprietary companies (built on OS platforms). Shouldn’t we be following precedents here; both business and other institutionally successful systems, eg email?

    > I’d love it if we all had these size and scalability issues, it would mean
    > our repositories were bursting at the brim with content, but alas that is
    > not the situation at present 😦

    Then how do we remedy the situation so we are playing at this level (another email some other time)?! Though, guess who is playing with EC2 and S3 on the scales we should be: <-Internet Archive. Some notes by Rodger Ackerman on the presentation here:

  2. Sulfide says : I absolutely agree with this !

  3. Wow, this really is deep. My brain is fried now, I will need to relax with a nice cup of tea to help me digest this

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: