UIMA Sample Clauses
UIMA. The Unstructured Information Management Architecture or UIMA is an open, scalable and extensible platform for the development, integration and deployment of applications that aim at analyzing large volumes of unstructured information contained in text, video and audio1. Although UIMA begun as an IBM project, an open source reference implementation of the UIMA specification is currently avail- able as an open source project2 under the Apache Software Foundation family of projects. Apache UIMA is offered under an Apache License, thus allowing its use for both proprietary and open/free applications. UIMA applications usually integrate in a chain one or more components for specific tasks like, for example, "sentence and token boundary detection" => "POS tagging" => "lemmatization" => "named entity detection". The components implement interfaces defined by the framework and are described in XML descriptor files, while the UIMA framework manages the flow of (annotated) data between the components. The frameworks are available for both Java and C++, with the Java Framework sup- porting running both Java and non-Java components (using the C++ framework). Another framework, the UIMA Asynchronous Scaleout Framework provides scale out capabilities to the Java framework via JMS (Java Messaging Services). A UIMA component that analyzes artifacts (e.g. documents) and generates annotations is called an Analysis Engine (AE). Analysis results from an AE produce are represented by typed Feature Struc- tures which refer to a span of the text under analysis. For example, an annotation over the span of text "Haiti" can have the type Location. A Dependency annotation for the same span can be accompanied by the value Subject for the attribute Label, and by an integer value for the attribute Head. An XML file called a Type System Descriptor defines the Feature Structure types that can be gener- ated by an AE. UIMA utilities will automatically generate Java classes corresponding to the types that are defined in the Type System Descriptor. In the example feature structure above, one would use the getHead() method of the Dependency class to get the integer representing the token‟s head. The anno- tations are stored in the Common Analysis Structure (CAS) which is used for communication of anno- tations between UIMA AEs and/or applications. Special AEs called CAS Consumers can be used to serialize CAS‟s to different formats. Although the Apache-UIMA site provides information on making...
