The Corpus Sample Clauses

The Corpus clause defines the body of assets, property, or funds that are held in trust or form the subject matter of a legal arrangement. In practice, this clause specifies exactly what is included in the corpus, such as cash, securities, real estate, or other tangible and intangible assets, and may outline how these assets are to be managed or invested. Its core function is to clearly identify and delineate the property subject to the trust or agreement, ensuring all parties understand what is covered and reducing the risk of disputes over the trust's contents.
The Corpus. For the remaining packages of the lexicon, an automatic contextual disambiguation is tried. To do this, a parallel corpus is used. The goal is to find conceptual contexts in the corpus which allow the disambiguation of translation alternatives.
The Corpus. In this chapter, the generation of Covid-themed tweets dataset will be discussed in details. The source of data (Section 3.1), the mechanism and word choice for tweet scraping ((Section 3.1), the rationale for choosing data produced in the twelve-day span (Section 3.2), the preliminary filtering process (Section 3.3), and string removal (Section 3.4) will be elaborated to demonstrate our dataset’s integrity. To ensure the quality of the data, we additionally apply quality assurance procedures (Section 3.5) with a hope to convince readers that Covid-themed Tweets Dataset could serve as a valid and rich event detection research resource in NLP community.
The Corpus. The initial data used to examine the issues mentioned above are first taken from previous accounts on conjunct agreement in both English and Serbian. Thus, the data from English are provided by Lorimor (2007), among others, and the initial data from Serbian are found in ▇▇▇▇▇▇▇ (1983), ▇▇▇▇▇▇▇▇▇▇ (1979), and ▇▇▇▇▇▇▇▇ (2009). After the examination of these works and identification of basic problems, a survey was conducted in order to look into the basic patterns of agreement employed by speakers of Serbian in their active production. The survey was completed by 60 participants, native speakers of Serbian. The speakers were asked to do a production task, supplying the missing agreement information on the verb based on the conjoined subjects, whose features were varied. The results of this survey provide the material based on which a theoretical model of conjunct agreement is developed in the thesis. The thesis is organized as follows. Section 2 gives a detailed introduction on the process of agreement, and the role of features in that process, as well as the nature of features themselves. Section 3 focuses on agreement with conjoined subjects. It provides a brief overview of agreement patterns with conjoined subjects in English and Serbian. The purpose of Section 4 is to explain the mechanism of agreement and the structure of coordinate phrase, so as to help the reader understand syntactic mechanisms of conjunct agreement provided in the following sections. Section 5 presents previous syntactic accounts on conjunct agreement. The accounts presented here provide a basis for the analysis of the data gained in the research. Section 6 identifies basic problems tackled by the research. Subsequently, it presents the results of the research together with their analysis. Section 7 contains concluding remarks.
The Corpus. In this chapter, the generation of FriendsQA (Section 3.1 will be dis- cussed in details. The web interface used for crowdsourcing (Section 3.2, the different rounds of experiments (Section 3.6 and the two phases in each round (Section 3.3 and 3.5) will be elaborated and explained to demonstrate our dataset’s integrity and diversity. To ensure the quality of the data, we additionally apply quality assurance procedures (Section 3.4), questions and answers pruning (Section 3.7), inter-annotators agreement (Section 3.8) and an extensive question-answer types analysis (Section 3.9) with a hope to convince that FriendsQA could serve as a valid and rich QA research resource in NLP community.
The Corpus. A corpus of manually-written summaries of texts has been assembled from materials provided to participants in the Document Understanding Conferences, which have been held annually since 2001. Most summaries in the corpus are abstracts, written by human readers of the source document to best express its content without restriction in any manner save length (words or characters). One method of performing automatic summarization is to construct the desired amount of output by concatenating representative sentences from the source document, which reduces the task to one of determining most adequately what ‘representative’ means. Such summaries are called extracts. In 2002, recognizing that many participants summarize by extraction, NIST produced versions of documents divided into individual sentences and asked its author volunteers to compose their summaries similarly. Because we use a sentence- extraction technique in our summarization system, this data is of particular interest to us. It is not included in the corpus being treated here and will be discussed in a separate paper. The ▇▇▇ ▇▇▇▇▇▇ contains 11,867 files organized in a three-level hierarchy of directories totaling 62MB. The top level identifies the source year and exists s imply to avoid the name collision which occurs when different years use same-named subdirectories. The middle 291 directories identify the document clusters; 1 This work will also be presented at the ACL Text Summarization Workshop in Barcelona, July 25-26, 2004 2001 28 316 56 400 84 949 165 1198 1 : 3 2002 59 59 626 59 803 116 116 1228 116 1576 1 : 2 2003 624 90 714 2496 360 2856 1 : 4 2004 740 124 864 2960 496 3455 1 : 4 Table 1: Number of Documents and Summaries by Size and by Year with Document : Summary Ratios DUC reuses collections of newswire stories assembled for the TREC and TDT research initiatives which report on a common topic or theme. Directories on the lowest level contain tagged and untagged versions of 2,781 individual source documents, and between one and five summaries of each, 9,086 in total. In most cases the document involved is just that: a single story originally published in a newspaper. However 552 directories, approximately 20% of the corpus, represent multi- document summaries—ones which the author has based on all the files in a cluster of related documents. For these summaries we constructed a source document against which to compare them by concatenating the individual documents in a cluster into on...
The Corpus. For our corpus study we extracted data from the Corpus Gesproken Nederlands (CGN, Spoken Dutch Corpus).4 The CGN is based on roughly 1000 hours of contemporary Dutch from the Netherlands and Flanders. The speech is composed of different genres, ranging from face-to-face and telephone conversations to interviews, debates, radio 4▇▇▇▇://▇▇▇▇▇.▇▇▇.▇▇▇.▇▇/cgn/ehome.htm
The Corpus. The corpus analysis was employed to examine agreement patterns in Somali sentences. In this study the relevant sentences must contain a specific focus particle that appears in two different forms waxa and waxaa. To ensure consistency, a search for both variants in the corpora was done. However, to simplify the text, all examples in this thesis were converted to the longer variant waxaa. While the chosen examples are typically short and often start with the focus particle, it is important to note that this does not necessarily mean that the sentences always begin and end in that structure. This study aimed to examine agreement in Somali clauses containing the focus particle waxaa and its impact on agreement, particularly gender agreement in sentences. To narrow the scope, five verbs were selected. In sentences with a verb and the focus particle waxaa, a subject noun can be either feminine or masculine, in both singular and plural forms. The corpus contained thousands of sentences with the focus particle and the five chosen verbs paired with various nouns.. The pictures below are an example that demonstrate two sentences with different verb forms but identical subject noun. Picture 1: waxaa jir(t)a cabsi (source: HaBiT)
The Corpus. A corpus of manually-written summaries of texts has been assembled from materials provided to participants in the Document Understanding Conferences, which have been held annually since 2001. It is available at the DUC Web site to readers who are qualified to access the DUC document sets on application to NIST. Most summaries in the corpus are abstracts, written by human readers of the source document to best express its content without restriction in any manner save length (words or characters). One method of performing automatic summarization is to construct the desired amount of output by concatenating representative sentences from the source document, which reduces the task to one of determining most adequately what ‘representative’ means. Such summaries are called extracts. In 2002, recognizing that many participants summarize by extraction, NIST produced versions of documents divided into individual sentences and asked its author volunteers to compose their summaries similarly. Because we use a sentence- extraction technique in our summarization system, this data is of particular interest to us. It is not included in the corpus being treated here and will be discussed in a separate paper. The ▇▇▇ ▇▇▇▇▇▇ contains 11,867 files organized in a three-level hierarchy of directories totaling 62MB. The top level identifies the source year and exists simply to avoid the name collision which occurs when 2001 28 316 56 400 84 949 165 1198 1 : 3 2002 59 59 626 59 803 116 116 1228 116 1576 1 : 2 2003 624 90 714 2496 360 2856 1 : 4 2004 740 124 864 2960 496 3455 1 : 4 Table 1: Number of Documents and Summaries by Size and by Year with Document : Summary Ratios different years use same-named subdirectories. The middle 291 directories identify the document clusters; DUC reuses collections of newswire stories assembled for the TREC and TDT research initiatives which report on a common topic or theme. Directories on the lowest level contain tagged and untagged versions of 2,781 individual source documents, and between one and five summaries of each, 9,086 in total. In most cases the document involved is just that: a single story originally published in a newspaper. However 552 directories, approximately 20% of the corpus, represent multi- document summaries—ones which the author has based on all the files in a cluster of related documents. For these summaries we constructed a source document against which to compare them by concatenating the individual documents in a ...