Distributed Metadata Service Clause Samples
Distributed Metadata Service. Rather than a single central repository, community metadata could be stored in a distributed system. This allows a number of scalability issues present in the central model to be addressed. Users querying the service can be redirected to different nodes to distribute load across a number of systems. In addition, individual nodes can be logically linked to a set number of communities, with consistency across nodes achieved using standard database replication procedures such as Oracle streaming, MySQL clustering or using HadoopDB. Even with this approach, holding full object level metadata would present scalability problems and it may be necessary to only hold a subset of the metadata corresponding to a standard. Nor does it address cases where communities do not provide an OAI-MPH interface (or similar) returning only change results. While this does address the scalability in terms of concurrent access by external users it does introduce an issue to consistency. Until data has been replicated to all nodes in the distributed service, there is potential for returning different results from each node in the service. However, this can be somewhat alleviated by ensuring new data is not made available until all nodes have a copy of the new information, or new information only becomes visible some time after uploading, the timing being determined by the ‘master’ node accepting the updated information from the community service. Again the level of metadata stored in the EUDAT layer will have an impact on the community metadata services. If sufficient object level metadata is available for objects within the EUDAT Metadata Service, then external users will not need direct access to community level metadata services, but if only collection level information or insufficient ‘core’ metadata is available within this service, then community metadata services will need be used by external users. This distributed approach also benefits from no single point of failure, since if one of the nodes goes down, others are available. If EUDAT metadata nodes are linked to distinct archives, then new information for those attached archives will not be available until the node is returned to service (possibly meaning deleted information will still be registered within EUDAT as available). Also, before the node is brought back to service it must be brought up-to-date with not only its own information but with information from other nodes in the cluster. For users wishing to use EUDAT ...
