Toggle navigation

As the Product Engineering Lead at SoftIron, I spend a lot of time elbow-deep in Ceph’s architecture.

 

SoftIron’s appliances are all engineered for full interoperability with Ceph, and  designed specifically to support customers in making use of its powerful (but sometimes complex) features. As part of this, we keep a close eye on technologies our customers will want to use with Ceph, so that we can give informed advice as they design and build their new clusters. Ceph is extremely customisable, allowing greater flexibility and efficiency than other software-defined storage (SDS) options, but thanks to that capacity for tweaking, fine-tuning, and otherwise poking at your cluster, it can be a little overwhelming to get started. By designing our appliances to make getting started just a little easier, and investigating ways they can be integrated with a variety of popular cloud computing technologies, we aim to make Ceph accessible for anyone, running a cluster at any scale, for any purpose.

The world today runs on data – and Ceph’s open source approach puts the power of that into the hands of anyone, even tiny teams running on the tightest of budgets. Naturally, we sit up and pay attention to other compelling open source projects that put the power of data-driven insights into anyone’s hands; enter iRODS.

iRODS: the open source data collection, curation, and collaboration platform

iRODS has been accelerating research across a diverse range of industries, from life sciences and agriculture to media and entertainment companies. Its plugin architecture supports microservices, storage systems, authentication, networking, databases, rule engines, and an extensible API.

Many of our customers have been interested in working with iRODS, and as we explored the possibilities ourselves, we realised that this is a rabbit hole that just never seems to end. This is a different kind of Wonderland though – one that empowers researchers, data analysts, and archivists alike. That’s why SoftIron chose to sponsor the recent 13th Annual iRODS User Group Meeting – and also why I attended to give a presentation on Ceph’s architecture, and how it relates to iRODS.

Research at its roots

Ceph’s open source origins are deeply rooted in research – it was first developed at University of California by Sage Weil in 2003, as a part of his PhD project. While it started as a file system, CephFS, Ceph’s unique architecture has evolved from there to support not only file storage, but block and object too.

This is all thanks to RADOS, the Reliable Autonomic Distributed Object Store and foundation layer of Ceph. As a distributed object store, RADOS utilizes a variety of software daemons to intelligently and efficiently store and retrieve data for its cluster. In the layer atop of RADOS live LIBRADOS, RBD, RGW and CephFS, enabling Ceph to offer application libraries, block storage, S3 connectivity, object storage, and file storage. Because RADOS ultimately stores and manages data as objects, Ceph is able to replicate data intelligently throughout a cluster using its unique CRUSH placement algorithm, offering increased data durability and reliability in the event of failures.

Ceph is software-defined storage, which means you can pretty much run it on anything. This lets users completely tailor their Ceph cluster to their exact compute and storage needs, ideal for optimising performance and minimising total cost of operations. Mix and match storage hardware as you choose, upgrade what you need to, when you need to – Ceph will handle it all. Whether you need a stretch cluster (where two or more nodes that are part of the same logical cluster are located in completely separate geographical locations) or you are building neural networks to solve complex analytical problems, Ceph can be customised to suit your needs.

During my talk, I looked at how Ceph stacks up to other SDS on the market. You can pretty much summarise these comparisons this way: compared to other SDS, Ceph is more flexible, more adaptable, and infinitely scalable. And with no software licensing to deal with, and the scrutiny of a highly active open source community, Ceph outshines other SDS solutions for both cost-effectiveness and currency.

Working with Ceph and iRODS

Because Ceph and iRODS are both flexible systems, there are inevitably a number of ways that we can integrate Ceph with iRODS.

To start with, we can use the S3 Resource Plugin – which appears to be fairly widely used across the community. We can address Ceph’s Rados Gateway just like any other S3 service, and it’s worth noting that Ceph’s RGW actually has fairly good support of the S3 API across the board.

In addition, we can also connect Ceph into iRODS via the UnixFilesystem Resource Plugin, using either CephFS or RBD, depending on our use case, and mount these directly into iRODS. From talking with others in the iRODS community, this approach is particularly performant, and the most popular approach within the community at this time.

iRODS also has a dedicated plugin for accessing the underlying RADOS libraries, which is pretty cool – and would usually be the best way to interact with Ceph. The problem is that this plugin hasn’t been maintained for a couple of years, and it’s had some issues recently, which I believe have now been fixed. We’re going to be looking into this resource further to establish its viability and performance vs other plugins. Watch this space.

SoftIron was proud to sponsor the 13th Annual iRODS User Group Meeting, and we’re keen to see iRODS’ continued evolution

If you’re curious about iRODS, and if you’re ready to take the next step, Ceph is a cost-effective, powerful foundation for building a cluster to perfectly support it.

And while you can run Ceph on anything, that doesn’t mean you should (my toaster has never been the same since!). Because SoftIron builds appliances tailored specifically to Ceph, not only is it easier to get started, but you can access performance breakthroughs unachievable with generic hardware. If you’re interested in experimenting with Ceph and iRODS, you can give a HyperDrive cluster a test drive yourself.