By Leigh Huntridge.
Over recent years we have seen a rapid rise in collaborative activities in scientific research, and especially more acutely in biomedicine’s global response to COVID-19. Driven by new transformative technologies as well as the increasing complexity and the interdisciplinary nature of today’s research problems, they are driving the need for technologies that enable researchers to collaborate effectively, efficiently and at speed.
Enter iRODS to Address the Challenge
iRODS is the leading open source data management software that supports collaborative research, and more broadly the management, sharing, publication, and long-term preservation (or curation) of data that can be distributed geographically and institutionally. The massive size of these data collections within the research community has encouraged the development of unique capabilities in iRODS that allow scaling to collections containing petabytes of data and hundreds of millions of files.
Big data needs big storage
SoftIron’s HyperDrive appliances, through its use of Ceph, the leading open-source storage software platform with scalable, unified data access (block, file or object), is a natural fit for iRODS and the customers that use it. The ability to present any access protocol as a storage resource within the iRODS data grid, enables institutions to allow users the flexibility of working with protocols they historically have had (filesystems) but also enable the use of developing protocols (such as object stores) which are becoming more prevalent. All from one simple to manage, easy to scale storage platform.
This single SoftIron HyperDrive storage platform, is a scale-out cluster that can be added to incrementally, and is limitless in scale with the number of nodes being added into the cluster.
Individual nodes can be added, non-disruptively, at any stage of the lifecycle of the cluster with online migration being able to assist in the introduction of the latest technology into the cluster. You can quickly see the organic nature of the growth of a cluster, and with its limitless scalability enables the storing of these petabyte filesystems and object stores.
The problem institutions run into with such large datasets is the environmental cost of running such infrastructure and might ordinarily revert to the introduction of tape assets to reduce power consumption. Whilst iRODS enables the use of tape as a storage resource and can tier data to it – there is, of course, an inherent problem of the slowness in retrieving the data from such medium. This is where SoftIron’s approach in designing our HyperDrive appliances to be ”task specific” has a huge benefit– we are able to operate as a ”live” storage infrastructure at a fraction of the power consumption of rival solutions.
For comparison, here are some typical power consumption examples from Ceph reference architectures:
*Assumed 40U of rack space available
The key benefit here is being able to have a large scale out cluster that provides relatively fast access by comparison to tape, that does not melt a small iceberg every year and saves polar bears’ lives! For those reliant on co-location facilities it can also dramatically reduce the hosting fees incurred.
A Proud Member of the iRODS Consortium
You can quickly see that with the combination of the protocol freedom, limitless scalability of SoftIron’s HyperDrive storage capacity, and with such high green environmental credentials, why it becomes a natural fit for deploying with the iRODS data management software.
And due to the community-based open source approach that both iRODS and Ceph have (of which SoftIron is a founding member of the Ceph foundation); we are both developing at an astonishing pace with new capabilities constantly being deployed in step with each other. It is fantastic for us here at SoftIron to be a part of this in joining the iRODS consortium, and to further enhance the curation of such large datasets under iRODS.
For more information on how SoftIron can work in your environment, email us at email@example.com
To learn more about the partnership, you can read our press release here.
Or, to better understand iRODS in action, Dave Fellinger, Data Management Technologist and Storage Scientist with the iRODS Consortium, shares some background and relevant use cases in his latest article with Inside HPC, called Curating, Discovering and Disseminating HPC Research Elements Using iRODS.