Safe replication requirements Sample Clauses
Safe replication requirements. Safe replication of data stands for a service that allows EUDAT communities to easily create replicas of their scientific datasets in multiple data centers. Deliverable D4.
Safe replication requirements. The safe replication service case does not at first seem to present much problems in terms of scalability of any federation technology. There are no specific performance requirements listed. However, there is a requirement for data to be replicated M times (presumably across different sites) and held for N years. This does have potential impact if all data being generated is being replicated. EMBL have estimated that the current hi-throughput genome sequencing machines can already produce several petabytes of data each day. DLS in the UK can generate 500,000 file each day with a total volume 1.5TB. This type of information leads to two scalability issues: Replicating large number of files. To achieve 'quick' replication, it would be necessary to parallelise the writing; replicating files concurrently is likely to lead to a backlog at the source site during which time their data may remain 'vulnerable' and the site itself would need to ensure it had additional copies until the 'EUDAT' replication had taken place. Large number of files being replicated concurrently implies heavy concurrent reads from the primary storage and concurrent writing of the replicas on 'satellite' storage system(s). This latter can be alleviated by distributing data among several sites. In the worst case there would be only a single site hosting remote storage. A further problem would be the large growth in the 'namespace' at both the source and remote sites. If files are small (where small is really storage system specific), this creates additional problems if the site hosts a tape-backed HSM, since recall of small files is generally not handled well by tape systems which are typically optimized for files > 1GB. Replicating large volumes. In many respects, large volumes present less of a scalability problem. The only issue is really the speed at which data can be transferred and the reliability of the protocol.
