Safe Replication Clause Samples
Safe Replication. The Safe Replication services are based on the use of the policy engine iRODS and various micro- services. In the different community islands, a total of seven partner sites (e.g. community and service provider data centres) have set up pilot iRODS instances to test cross-site replication. After solving some network issues (e.g. opening ports within a firewall), the basic tests that were conducted were successful. Some performance issues were found which proved to be due to some configurations and the calculation of checksums. After some optimizations, the performance has been improved. The next steps are to move the pilot iRODS instances to production systems, which are integrated both with the mass storage systems of the partners, and with local and EUDAT operations. After setting up the basic core services, the task force has focused on building the PID micro-service to automatically register PIDs to DOs on ingest and to keep track of replicas. Part of the PID building process has included discussions on the replication policies, because these influenced both the registration of the replicas and linking the replicas to the original DO. This issue lead to the use of the RoR field which always points back to the original DO. The PID micro-service has been implemented and has been successfully demonstrated in the CLARIN island. The next step is to extend the tests to the other islands. For the registration of PIDs, EUDAT is going to use EPIC handle service, and hence EUDAT is negotiating with the EPIC consortium to develop an MoU. In its initial phase, the Safe Replication service is considered to be a data management tool that enables community data managers to define and apply the community replication policies. Because the number of users requiring direct access is limited and user level access is still provided via community specific services and portals, this phase of the Safe Replication does not depend on a federated AAI solution. It is expected that the safe replication candidate service will be handed over to the operations team in M12 of the EUDAT project (that is, September 2012).
Safe Replication. Data replication imposes the following requirements on the AAI. Consistent (federated) identities are needed. Delegation is necessary. Support for automated services (services which can authenticate themselves to other services without human intervention) is required. Consistent access control management across all replicas (as well as potentially other security attributes) is necessary. Community managers need to manage access control permissions. Data centres must be authenticated (host/service certificates). Access is logged – so persistent, unique, and non-reusable user ids are required. Traceability may be required. The technology must work with iRODS (also for automated services).
Safe Replication. The ”Safe Replication” service enables EUDAT communities to easily create replicas of their scientific datasets in multiple data centres. This service can be considered as the fundamental service in EUDAT for storing data reliably, accessible and persistently in an environment of distributed repositories in different administrative domains. The degree of reliability, accessibility and persistency depends of the the variety of storage technologies used, the number of replicas and the quality of assurance that the centres are offering. While deliverable D4.
1.1 provides more information about the requirements associated with such a service, deliverable D5.
1.1 outlines several technology areas that are required, along with potential candidates. This section is based on these two documents and describes the implementation of the early candidate services in order to realize the “Safe Replication” service.
3.1.1 Concrete reference architecture
1.1. Figure 5 illustrates the derived concrete architecture for the service using technologies that are introduced further in this document. Figure 6 depicts the deployment locations in the context of the employed technologies (i.e. EPIC PID service, iRODS, etc.). Each user community and service provider infrastructure will be heterogeneous (for example, the firewalls, and accounting systems will differ). The architecture in Figure 6 helps us to understand the problems faced in a distributed system such as the federation domain established by EUDAT, which includes different organizations and administrative domains. It needs to be understood that, while the work moves from ‘test’ phases to ‘production’ phases, many aspects within the different data replication islands are different (such as user accounts, firewall setups, and DMZ setups). These differences need to be observed, and solutions need to be explored in order to ensure a production status towards the end of the TF lifetime. A more comprehensive reference model and architecture is part of the EUDAT work plan. The concrete architecture in Figure 6 provides a ‘frame of reference’ for stakeholders, while there will be concrete derived architectures for each island with concrete technologies deployed on real existing servers. A key functionality is the inter-working of iRODS and EPIC that is conceptually illustrated in Figure 6. Subsequent sections provide more implementation details (for example, about micro-services and rules).
