The Corpus. For the remaining packages of the lexicon, an automatic contextual disambiguation is tried. To do this, a parallel corpus is used. The goal is to find conceptual contexts in the corpus which allow the disambiguation of translation alternatives.

The Corpus. A corpus of manually-written summaries of texts has been assembled from materials provided to participants in the Document Understanding Conferences, which have been held annually since 2001. Most summaries in the corpus are abstracts, written by human readers of the source document to best express its content without restriction in any manner save length (words or characters). One method of performing automatic summarization is to construct the desired amount of output by concatenating representative sentences from the source document, which reduces the task to one of determining most adequately what ‘representative’ means. Such summaries are called extracts. In 2002, recognizing that many participants summarize by extraction, NIST produced versions of documents divided into individual sentences and asked its author volunteers to compose their summaries similarly. Because we use a sentence- extraction technique in our summarization system, this data is of particular interest to us. It is not included in the corpus being treated here and will be discussed in a separate paper. The XXX xxxxxx contains 11,867 files organized in a three-level hierarchy of directories totaling 62MB. The top level identifies the source year and exists s imply to avoid the name collision which occurs when different years use same-named subdirectories. The middle 291 directories identify the document clusters; 1 This work will also be presented at the ACL Text Summarization Workshop in Barcelona, July 25-26, 2004 DOCUMENTS SUMMARIES D : S 10 50 100 200 □ 10 50 100 200 □ 2001 28 316 56 400 84 949 165 1198 1 : 3 2002 59 59 626 59 803 116 116 1228 116 1576 1 : 2 2003 624 90 714 2496 360 2856 1 : 4 2004 740 124 864 2960 496 3455 1 : 4 □ 1423 87 1156 115 2781 5572 200 3033 281 9086 1 : 3 Table 1: Number of Documents and Summaries by Size and by Year with Document : Summary Ratios DUC reuses collections of newswire stories assembled for the TREC and TDT research initiatives which report on a common topic or theme. Directories on the lowest level contain tagged and untagged versions of 2,781 individual source documents, and between one and five summaries of each, 9,086 in total. In most cases the document involved is just that: a single story originally published in a newspaper. However 552 directories, approximately 20% of the corpus, represent multi- document summaries—ones which the author has based on all the files in a cluster of related documents. For these summaries we constructed...

The Corpus. For our corpus study we extracted data from the Corpus Gesproken Nederlands (CGN, Spoken Dutch Corpus).4 The CGN is based on roughly 1000 hours of contemporary Dutch from the Netherlands and Flanders. The speech is composed of different genres, ranging from face-to-face and telephone conversations to interviews, debates, radio 4xxxx://xxxxx.xxx.xxx.xx/cgn/ehome.htm shows and read aloud books. The speech files amounting to roughly 10M words have been orthographically transcribed, lemmatized, and tagged for part-of-speech information. Moreover, about 10% of the corpus has been syntactically annotated (van der Wouden et al. 2002). From this syntactically annotated part of the corpus we have ex- tracted all prepositional phrases. This amounted to 57,287 PP in- stances containing 139 unique adpositions and 12,947 unique heads in the adpositional complements. From this set we extracted all heads of the adpositional complements with a frequency higher than 10 oc- currences. These 766 unique words were subsequently annotated by the two authors for their animacy using the coding scheme of Xxxxxx et al. (2004) which provides a 9-way classification. Where possible, disagreement was resolved by discussion. Of these 766 words, 154 were left out due to unresolved disagreement between the two an- notators and 53 because they contained context-dependent elements,

The Corpus. The initial data used to examine the issues mentioned above are first taken from previous accounts on conjunct agreement in both English and Serbian. Thus, the data from English are provided by Lorimor (2007), among others, and the initial data from Serbian are found in Xxxxxxx (1983), Xxxxxxxxxx (1979), and Xxxxxxxx (2009). After the examination of these works and identification of basic problems, a survey was conducted in order to look into the basic patterns of agreement employed by speakers of Serbian in their active production. The survey was completed by 60 participants, native speakers of Serbian. The speakers were asked to do a production task, supplying the missing agreement information on the verb based on the conjoined subjects, whose features were varied. The results of this survey provide the material based on which a theoretical model of conjunct agreement is developed in the thesis. The thesis is organized as follows. Section 2 gives a detailed introduction on the process of agreement, and the role of features in that process, as well as the nature of features themselves. Section 3 focuses on agreement with conjoined subjects. It provides a brief overview of agreement patterns with conjoined subjects in English and Serbian. The purpose of Section 4 is to explain the mechanism of agreement and the structure of coordinate phrase, so as to help the reader understand syntactic mechanisms of conjunct agreement provided in the following sections. Section 5 presents previous syntactic accounts on conjunct agreement. The accounts presented here provide a basis for the analysis of the data gained in the research. Section 6 identifies basic problems tackled by the research. Subsequently, it presents the results of the research together with their analysis. Section 7 contains concluding remarks.

The Corpus. In this chapter, the generation of Covid-themed tweets dataset will be discussed in details. The source of data (Section 3.1), the mechanism and word choice for tweet scraping ((Section 3.1), the rationale for choosing data produced in the twelve-day span (Section 3.2), the preliminary filtering process (Section 3.3), and string removal (Section 3.4) will be elaborated to demonstrate our dataset’s integrity. To ensure the quality of the data, we additionally apply quality assurance procedures (Section 3.5) with a hope to convince readers that Covid-themed Tweets Dataset could serve as a valid and rich event detection research resource in NLP community.

Related to The Corpus

Executive Committee (A) The Executive Committee shall be composed of not more than nine members who shall be selected by the Board of Directors from its own members and who shall hold office during the pleasure of the Board.
Plan Administrator Employees must elect a plan administrator during their initial enrollment in Advantage and may change their plan administrator election only during the annual open enrollment and when permitted under Section 5. Dependents must be enrolled through the same plan administrator as the employee.
The Committee For purposes of this Agreement, the term “Committee” means the Compensation Committee of the Board of Directors of the Company or any replacement committee established under, and as more fully defined in, the Plan.
GRANTEE Grantee will be in default under this Grant upon the occurrence of any of the following events:
the Grant Recipient (a) possesses or will possess a Secure Legal Interest in the Site;
INTERESTS OF DIRECTORS AND CONTROLLING SHAREHOLDERS Save for their respective shareholdings in the Company and as disclosed, none of the Directors or controlling shareholders of the Company or their respective associates has any direct or indirect interest in the Shareholder’s Loan.
TRUST FUNDS The Owner hereby gives power to the Agent to deposit all receipts collected for the Owner, less any sums properly deducted or disbursed, in a financial institution whose deposits are insured by an agency of the United States government. The funds shall be held in a trust account separate from the Agent’s personal accounts. The Agent shall not be liable in the event of a bankruptcy or failure of a financial institution. All funds managed under this section must be done so in accordance with applicable law.
Trust Property The property, or interests in property, constituting the Trust Estate from time to time. UCC: The Uniform Commercial Code, as in effect in the relevant jurisdiction.
The Plan This Plan is the Fund's written distribution and service plan for Class N shares of the Fund (the "Shares"), contemplated by Rule 12b-1 as it may be amended from time to time (the "Rule") under the Investment Company Act of 1940 (the "1940 Act"), pursuant to which the Fund will compensate the Distributor for its services in connection with the distribution of Shares, and the personal service and maintenance of shareholder accounts that hold Shares ("Accounts"). The Fund may act as distributor of securities of which it is the issuer, pursuant to the Rule, according to the terms of this Plan. The terms and provisions of this Plan shall be interpreted and defined in a manner consistent with the provisions and definitions contained in (i) the 1940 Act, (ii) the Rule, (iii) Rule 2830 of the Conduct Rules of the National Association of Securities Dealers, Inc., or any applicable amendment or successor to such rule (the "NASD Conduct Rules") and (iv) any conditions pertaining either to distribution-related expenses or to a plan of distribution to which the Fund is subject under any order on which the Fund relies, issued at any time by the U.S. Securities and Exchange Commission ("SEC").
Trust Fund The Buyer is a trust fund whose trustee is a bank or trust company and whose participants are exclusively (a) plans established and maintained by a State, its political subdivisions, or any agency or instrumentality of the State or its political subdivisions, for the benefit of its employees, or (b) employee benefit plans within the meaning of Title I of the Employee Retirement Income Security Act of 1974, but is not a trust fund that includes as participants individual retirement accounts or H.R. 10 plans.

The Corpus Sample Clauses

Filter & Search

Related Clauses

Parent Clauses

Sub-Clauses

Related to The Corpus