Data Structures. A key difference between AML and AgreementMaker is that in the former on- tologies are represented exclusively by internal data structures, whereas in the latter internal data structures are used in addition to the Jena OntModel. While building the internal data structures from the OntModel takes time, if those structures are designed with the efficiency of the matching process in mind, they will reduce the total processing time considerably. Furthermore, the internal data structures take up less memory than the OntModel, so in not keeping the latter in memory, we effectively increase the available memory for the matching process. Last but not least, this setup means that AML’s ontology matching module is not tied to Jena or any specific ontology-reading API. Thus AML can work with any ontology-reading API by simply changing the ontology loading module. Lexicon is a data structure that links each class in an ontology with its “names” (i.e., local names, labels, and synonyms) and the provenance of those names (i.e., whether they come from a local name or label, or from which type of syn- onym property statement). While an equivalent data structure already existed in AgreementMaker, it was built after the ontology loading process and only used by some matching algorithms. In AML, the Lexicon is a primary data structure used by all matching algorithms that require lexical information. A novel aspect that was incorporated in the AML Lexicon was a system of weights to reflect the reliability of each provenance. For instance, synonyms ob- tained from hasExactSynonym statements are in principle more reliable than synonyms obtained from hasRelatedSynonym statements, as they should be closer in meaning to the concept described by a class. Thus, local names were given a weight of 1.0, labels a weight of 0.95, exact synonyms a weight of 0.9 and other synonyms a weight of 0.85. These weights may be used by any matching algorithm that uses the Lexicon. The internal structure of the Lexicon consists on two MultiMaps (which are HashMaps of HashMaps) containing classes, names and provenances, with one having the class as key and the other having the name as key. Thus, the Lexicon can be queried by both class and name at virtually no computational cost. RelationshipMap is a data structure that links each class to the classes related to it through is a or part of relationships or disjoint clauses. It complements the Lexicon, and is a very efficient alternative to the node-based tree structure used in AgreementMaker to represent each ontology. The RelationshipMap stores all is a and part of paths in an ontology with transitive closure, and includes the distance of each path in number of edges. It also stores all direct disjoint clauses in an ontology (without transitive closure). Like the Lexicon, the RelationshipMap is based on MultiMaps. It includes two MultiMaps for relationships which contain ancestors, descendents and relation- ship (i.e., type and distance), with one having the ancestor as key and the other one having the descendent as key. It also includes a HashMap of Sets for disjoint clauses, linking each class to all classes that are disjoint with it. Thus the Rela- tionshipMap can be queried to obtain all descendents of a class, all ancestors of a class, and all classes disjoint with a class at virtually no computational cost. Alignment is a data structure used by the ontology matching module to store mappings between the input ontologies. This structure was already used by AgreementMaker, and was ported directly from it. However, in AgreementMaker, Alignment was used only to store the final output of a matching algorithm or combination of algorithms. During the matching procedure, the primary data structure used by AgreementMaker is a matrix that stores the similarities be- tween all concepts of the source ontology against all concepts of the target on- tology, which we abbreviate in the rest of the paper as a all-against-all strategy. The problem with this structure is that it takes O(m n) memory (where m and n are the number of concepts of the source and target ontologies, respec- tively) and therefore does not scale for large ontologies. For instance, the matrix that results from matching two ontologies with 50,000 classes would occupy 18.6 GB of memory, which is beyond the capacity of our server. Since the number of mappings in a matching problem with cardinality one-to-one is O(min(m, n)), the vast majority of the values in the similarity matrix is very small or zero, thus making their storage unnecessary. In AML we opted for storing similari- ties directly in the Alignment and discarding similarities that are below a given threshold. The internal structure of Alignment in AML is identical to that of Agreement- Maker. It includes two MultiMaps that contain the source class, target class and similarity, with one having the source class as key and the other one having the target class as key. This enables efficient querying of mappings by class, and means that Alignment corresponds to a sparse matrix. In addition, Alignment also includes a list structure that enables sorting and thus facilitates selection.
Appears in 1 contract
Sources: Ontology Matching System