
Next-Generation Petabyte Scale Backup and Archive Platform
About the Client
Company’s Request
Technology Set
CentOs, Ubuntu 14, RHEL 7.0 | Chosen for stability and enterprise support, these Linux distributions provide a reliable base for developing high-performance applications and are optimized for data-intensive operations for backup and archive systems. |
C, and C++ | Offer performance efficiency essential for system operations and data processing. |
Node.js | Selected for its ability to handle asynchronous tasks and network requests efficiently, enhancing backend interaction capabilities. |
.deb and .rpm | These packages allow for straightforward software distribution and installation across different Linux environments, providing easy deployment and consistent operation across platforms. |
Cloud Copy On Write (CCOW) | Integrated for its innovative data management that improves data integrity and efficiency. CCOW enables immediate data duplication and versioning for enhanced recovery and high availability. |
Jenkins CI | Automation testing confirms consistency and reliability, while Jenkins CI automates builds and tests, improving development efficiency and speeding up the deployment cycle by continuously integrating feedback and updates. |
We engineered the core functionality for Cloud Copy On Write, a forward-thinking technology currently pending patent approval, designed to ensure safe and efficient data management.
CCOW employs an innovative approach where new data is written to new storage locations rather than overwriting existing data, with updates made to the metadata pointers subsequently. This method minimizes the risk of data corruption and enables quicker, more reliable data recovery processes.
Implementing CCOW was required for building a solution capable of efficiently handling and scaling to petabyte-sized data volumes for clients managing large-scale data operations.
A significant challenge was developing a system that could efficiently manage petabyte-scale data volumes without performance degradation. To overcome this, we optimized our data handling algorithms for high efficiency and implemented a distributed architecture. This architecture distributes the data across multiple server nodes, balancing the load and enhancing data retrieval and backup speeds by leveraging parallel processing techniques.
To provide reliability, we developed an extensive suite of automated tests covering a range of functionalities from basic data operations to complex disaster recovery scenarios. Automating these tests enabled consistent evaluation of the system under diverse conditions. We integrated these tests with Jenkins Continuous Integration (CI) to automate their execution at various stages of the development lifecycle. This strategy improved development productivity and enhanced the quality of the code by allowing early detection and resolution of issues. The Jenkins CI pipeline was configured to provide real-time feedback on the health of the codebase, enabling quick adjustments.
During the project, our team worked on the integration across various platforms, including OpenStack, Linux, VMware, and Windows, to guarantee functionality in diverse IT environments. This integration involved developing custom adapters and APIs for compatibility and optimized performance across these different operating systems and platforms.
We also fortified our solution with advanced encryption standards and comprehensive access controls in response to increasing data security concerns and stringent compliance mandates. For encryption, we utilized AES-256 across all data at rest, and for data in transit, we employed TLS 1.2 protocols with mutual authentication to provide secure data exchanges.
Value Delivered


