By Harry Richardson, Chief Scientist
Did you know that the CERN Data Centre processes an average of one petabyte of data per day as a result of the particle collision experiments in their Large Hadron Collider (LHC)? The 27km long, circular tunnel’s experiments alone churn out approximately 90PB of data per year, and a further 25PB are produced per year from other, non-LHC-related experiments at CERN.1 When scientists at the helm of the LHC in 2012 sifted through the data and discovered the Higgs boson, they were able to confirm it’s position in the Standard Model of physics and definitively prove the existence of the mysterious process that gives all other particles their substance. This was a pivotal moment for science as it opened the door to further our understanding of the very origins of the universe.
So how is all this data managed? What technology acts as the backbone for securing, accessing and sharing this vast storage estate, and how is it working to support CERN’s ground-breaking research?
Over the years, the rise of super-computing and the resulting growth in storage requirements (such as the need for block storage for both OpenStack VMs and file services like AFS and NFS) have driven the development of a generic backend storage service for CERN IT.
Ceph is a natural solution due to its native block device layer RBD which is built upon its scalable, reliable, and performant object storage system, RADOS. Ceph is the leading open-source software-defined storage solution on the market, with countless supporters from research data centers of every size and vertical. Some of the reasons it’s such an ideal solution are that it’s flexible, inexpensive, fault-tolerant, hardware neutral, and infinitely scalable which makes it an excellent choice for research institutions of any size and vertical, not just CERN. In addition, because most research organizations have unique storage requirements, vendor lock-in can be avoided altogether. Other benefits include:
- support for multiple storage types including object, block, and file systems. Regardless of the type of research being conducted, the resulting files, blocks and/or objects can all live in harmony in Ceph.
- natively supporting hybrid cloud environments which makes it easy for remote researchers, who might be located anywhere in the world, to upload their data in different storage formats.
- resilience: there’s no need to buy redundant hardware in the event a component fails, because Ceph’s self-healing functionality quickly replicates the failed node, ensuring data redundancy and higher availability.
SoftIron makes the world’s finest, unified storage solutions for today’s enterprise. Our HyperDrive® storage appliance is custom-designed and built to optimize Ceph, unleashing the full potential of the technology for the research data center. HyperDrive is a high-performance, scale-out solution that runs at wire-speed, and at less than 100W per 1U form factor.
The University of Minnesota’s Supercomputing Institute (MSI) is a world-class facility that provides computing infrastructure and expertise to foster innovation through advanced computing technologies, scientific computing services, and more. When MSI’s legacy infrastructure was hampering the Institute’s goals to service its internal customers with scalable, easy-to-deploy, high-performance data solutions, it turned to SoftIron for a more cost-effective, scalable storage solution. Read more about how the Minnesota Supercomputing Institute unlocked cost savings and scalability with SoftIron and Ceph here: Download the Minnesota Supercomputing Institute case study.
2 Daniel van der Ster and Arne Wiebalck 2014 J. Phys.: Conf. Ser. 513 042047