Information User Landscape Clause Samples

Information User Landscape. Given the wider goals of EUDAT beyond the currently engaged communities, information providers can be split into at least 4 separate categories:  Information providers holding legacy data sets. These are holdings which, while still of use in current research, are no longer being extended or actively developed. Examples of this could be remote sensing data where the instruments are no longer operational or data from now defunct projects which is still useful for ongoing long term monitoring. These providers may have a mature metadata model, but this is not guaranteed and may not fit with any current metadata standards depending on the age of the holdings.  Information providers holding active data sets where new data is being added to the community repository and is under active development. In this case it is likely they will hold a metadata catalogue, but again it may not adhere to any current standards. For example, the WLCG metadata is really only the collection of GUIDs and their resolution to data objects while many other groups have models based on a well defined standards (Dublin Core, CSGDM, EML, etc).  Immature communities which are either developing, or considering the development of, a metadata model to allow simple search and retrieval of metadata associated with forthcoming or ongoing holdings.  ‘Ad-hoc users’ which may want to make use of EUDAT as a ‘dropbox’ for scientific data and have no interest in developing their own metadata catalogue and just want EUDAT to manage their metadata. Each of these information providers will have an impact on scalability which in some respects is difficult to quantify. Ad-hoc users are likely to contribute least information with respect to data on an individual basis, but there are potentially a large number of these. This is likely to be particularly acute in the initial stages of EUDAT roll out to wider communities if a ‘try before you buy’ system is offered to potential providers. Legacy data providers are probably the easiest to quantify (for instance the Sloane Legacy Survey contains 230 million unique imaged objects, and 1,270,000 unique spectra). However, each community would need to provide similar information. The active and immature information providers are the largest unknown. It is difficult to quantify their potential data holding since the impending ‘data tsunami’ is based on both an increase in the volume of data stored, the number of distinct objects stored and the number of communities...