Conversion Process. Free digital tools already exist that facilitate a transformation from data in a tabular format to a linked data form. In selecting appropriate tools for the conversion of TOE from its legacy form to its desired linguistic linked data form, a number of requirements on the process need to be taken into account. These requirements, based on the premise that conversions ought to be reproducible by scholars with minimal effort, are listed in Table 6 and have been categorized according to priority 9 . Two requirements are mandatory, since these ensure an accurate conversion. The first is that the conversion process must accept tabular input either in an Excel spreadsheet or CSV format and provide transformed output in the RDF format (M1). The second requirement is that the process must be able to apply logic that relates the structure of the source to terminology from the desired linked data vocabularies (M2). The conversion logic for the TOE data has been described in Table 4. This logic also demands combining information from multiple tables, available in separate files. To illustrate, most of the information for lexical entries according to OntoLex-Lemon is found in the lexeme table of TOE. The part of speech of such an entry, however, is registered in another table of TOE: the category table. Next to the requirements that are mandatory, three others have been formulated to which the process should adhere. Although not mandatory for an accurate outcome, these three requirements are geared towards increasing the maintainability and user- friendliness of the process. Firstly, the process should accept conversion logic in a form that has been standardized and is application-independent (S1). The alternative – relying on a format specific to a single tool – would limit the applicability, understandability, and reusability of the captured logic. Considering the availability of specific tooling and continued support from its creators are by no means guaranteed (as indeed seen for a number of conversion tools)10, great reliance on a single tool should be avoided. Secondly, the process should be executable by scholars without a background in software development (S2). To be more specific, it should be possible to obtain and install the necessary tools without first having to compile the source code. Moreover, the tools should provide a visual user interface rather than only a command- line execution mechanism. Lastly, the conversion process should be automatable so that it can be performed again with minimal effort after an update of the thesaurus data (S3). The final requirement for the process, assigned a lower priority than the foregoing ones, is meant to facilitate deploying and utilizing the resulting linguistic linked data. Web- based platforms will be able to retrieve and query information from a thesaurus if its 9 The requirement prioritization follows the MoSCoW principles, developed by ▇▇▇ ▇▇▇▇▇ et al. (1994). 10 Availability and support for the tools AnnoCultor, Aperture, and NOR2O have been discontinued. conversion output has been stored in a database that facilitates access for linked data technology (C1). A database for linked data content is called a triplestore. Triplestores typically allow accessing their stored content via queries using the standard querying language SPARQL, which web applications can use to interact with the data. 173 M1 Accept required input and output formats M2 Apply required logic for conversion S1 Employ standardized form for logic S2 Allow for scholars to perform each step S3 Allow for automation of all steps involved C1 Store output in a triplestore with a query endpoint Table 6: Requirements on the conversion process, categorized according to priority The W3C provides a convenient overview of a number of tools that convert data into RDF (ConverterToRdf). Eighteen free tools listed there comply with requirement M1. These tools are listed in Table 7. Five of them appear to be discontinued, that is, they are no longer maintained or offered for download. Nine others do not comply with M2, either because they do not allow applying logic other than their default (Apache Any23) or because they cannot combine information from tables found in separate input files (RDF123; RDF Refine; csv2rdf4lod; Anzo for Excel; TabLinker; Excel2rdf; Sheet2RDF; Spread2RDF). The remaining four tools, then, conform to both mandatory requirements and should be able to convert the TOE legacy form into a linguistic linked data form. These tools are Datalift, Tarql, Virtuoso Sponger, and XLWrap. One of the four remaining candidate tools for converting TOE data fails to meet requirement S1. This tool, XLWrap, defines its own form for capturing conversion logic, rather than using a standardized form (Langegger, 2017). A number of standardized forms for capturing conversion logic have been recommended by W3C. Two of these are specifically intended for logic converting tabular data into RDF: CSVW and R2RML. Unfortunately, these two forms are unsuitable for the conversion of TOE. The former cannot be used to combine information from multiple input files. The latter facilitates only relational databases as input and cannot be applied to Excel or CSV files. In fact, the three remaining tools – Datalift, Tarql, and Virutoso Sponger – facilitate transformations utilizing another logic form: SPARQL. This query language, standardized by W3C, allows selecting patterns from an RDF source and constructing new RDF data that adheres to desired patterns.
Appears in 3 contracts
Sources: Digital Thesauri as Semantic Treasure Troves: A Linguistic Linked Data Approach to 'A Thesaurus of Old English', Digital Thesauri as Semantic Treasure Troves: A Linguistic Linked Data Approach to 'A Thesaurus of Old English', Digital Thesauri as Semantic Treasure Troves: A Linguistic Linked Data Approach to "A Thesaurus of Old English"