Data set Collection Sample Clauses

Data set Collection. ‌ From the total available corpus (70k documents), we currently have access to ~60,000 excavation reports and related documents, such as appendices, drawings and maps. These texts have been gathered by DANS (Digital Archiving and Networked Services) in the Netherlands, over the past 20 years. We received the documents from DANS as PDF files, and have used the pdftotext tool (Glyph & Cog LLC, 1996) to convert these to plain text. This data set contains 30,152,318 lines and 657,808,600 words (as counted by the command line tool “wc”). The texts are quite diverse; the dates of publication span decades with the earlier ones having been scanned and OCRd from hardcopies created in the 80s. The other temporal variation is in how old the found artefacts are, ranging from 200,000 BC to the present. Also, the type of research can be very different between reports, some might describe a short desk evaluation of a small area without any fieldwork, while others detail huge excavations over multiple years with detailed analysis by a team of specialists. To get a representative sample across all these ranges, a random sampling strategy would not be ideal, and we instead opted to manually select documents, taking into account the variation described above. We selected a total of 15 documents as annotation candidates (~42,000 tokens). For the purposes of calculating the IAA and evaluating the annotation guide- lines, we manually selected roughly 100 sentences from these documents contain- ing all the entity types (Table 3.1, explained below) and specific difficult cases as validation set, annotated by all annotators.
AutoNDA by SimpleDocs
Data set Collection. From the total available corpus (70k documents), we currently have access to ~60,000 excavation reports and related documents, such as appendices, drawings and maps. These texts have been gathered by DANS (Digital Archiving and Networked Services) in the Netherlands, over the past 20 years. We received the documents from DANS as PDF files, and have used the pdftotext tool (Glyph & Cog LLC, 1996) to convert these to plain text. This data set contains 30,152,318 lines and 657,808,600 words (as counted by the command line tool “wc”). The texts are quite diverse; the dates of publication span decades with the earlier ones having been scanned and OCRd from hardcopies created in the 80s. The other temporal variation is in how old the found artefacts are, ranging from 200,000 BC to the present. Also, the type of research can be very different between reports, some might describe a short desk evaluation of a small area without any fieldwork, while others detail huge excavations over multiple years with detailed analysis by a team of specialists. To get a representative sample across all these ranges, a random sampling strategy would not be ideal, and we instead opted to manually select documents, taking into account the variation described above. We selected a total of 15 documents as annotation candidates (~42,000 tokens). For the purposes of calculating the IAA and evaluating the annotation guide- lines, we manually selected roughly 100 sentences from these documents contain- ing all the entity types (Table 3.1, explained below) and specific difficult cases as validation set, annotated by all annotators. Entity Description Examples Artefact An archaeological object found in the ground. Axe, pot, stake, arrow head, coin Time Period A defined (archaeological) period in time. Middle Ages, Neolithic, 500 BC, 4000 BP Location A placename or (part of) an address. Amsterdam, Xxxxx- xxxxxx 0, Xxxxxxxxxx Context An anthropogenic, definable part of a stratigraphy. Something that can contain Artefacts Rubbish pit, burial mound, stake hole Material The material an Artefact is made of. Bronze, wood, flint, glass Species A species’ name (in Latin or Dutch) Cow, Corvus Corax, oak Table 3.1: Descriptions and examples for each entity type. Examples are trans- lated from Dutch.

Related to Data set Collection

  • Data Collection Some downloaded software included in the Materials may generate and collect information about the software and usage and transmit it to Intel to help improve Intel’s products and services. This collected information may include product name, product version, time of event collection, license type, support type, installation status, hardware and software performance, and use. 9.

  • Income Collection, Transaction Processing, Account Administration 0.25 of a basis point per annum on the average net assets of the Fund.

  • Master Servicer Collection Account (a) The Master Servicer shall establish and maintain in the name of the Trustee, for the benefit of the Certificateholders, the Master Servicer Collection Account as a segregated trust account or accounts. The Master Servicer Collection Account shall be an Eligible Account. The Master Servicer will deposit in the Master Servicer Collection Account as identified by the Master Servicer and as received by the Master Servicer, the following amounts:

  • Allocations of Finance Charge Collections The Servicer shall allocate to the Series 1997-1 Certificateholders and retain in the Collection Account for application as provided herein an amount equal to the product of (A) the Floating Allocation Percentage and (B) the Series 1997-1 Allocation Percentage and (C) the aggregate amount of Collections of Finance Charge Receivables deposited in the Collection Account on such Deposit Date.

  • Data Collection and Usage The Company and the Employer collect, process and use certain personal information about the Participant, including, but not limited to, the Participant’s name, home address and telephone number, email address, date of birth, social insurance, passport or other identification number, salary, nationality, job title, any Shares or directorships held in the Company, details of all restricted stock units or any other entitlement to Shares or equivalent benefits awarded, canceled, exercised, vested, unvested or outstanding in the Participant’s favor (“Data”), for the legitimate purpose of implementing, administering and managing the Plan. The legal basis, where required, for the processing of Data is the Participant’s consent.

  • Shared Principal Collections Subject to Section 4.04 of the Agreement, Shared Principal Collections for any Distribution Date will be allocated to Series 2018-6 in an amount equal to the product of (x) the aggregate amount of Shared Principal Collections with respect to all Principal Sharing Series for such Distribution Date and (y) a fraction, the numerator of which is the Series 2018-6 Principal Shortfall for such Distribution Date and the denominator of which is the aggregate amount of Principal Shortfalls for all the Series which are Principal Sharing Series for such Distribution Date. The “Series 2018-6 Principal Shortfall” will be equal to (a) for any Distribution Date with respect to the Revolving Period, zero, (b) for any Distribution Date with respect to the Controlled Accumulation Period, the excess, if any, of the Controlled Deposit Amount with respect to such Distribution Date over the amount of Available Principal Collections for such Distribution Date (excluding any portion thereof attributable to Shared Principal Collections), and (c) for any Distribution Date with respect to the Early Amortization Period, the excess, if any, of the Invested Amount over the amount of Available Principal Collections for such Distribution Date (excluding any portion thereof attributable to Shared Principal Collections).

  • Billing and Collection The Originating party shall xxxx and collect such information service charges and shall remit the amounts collected to the Terminating Party less:

  • Excess Finance Charge Collections Series 2018-6 shall be an Excess Allocation Series. Subject to Section 4.05 of the Agreement, Excess Finance Charge Collections with respect to the Excess Allocation Series for any Distribution Date will be allocated to Series 2018-6 in an amount equal to the product of (x) the aggregate amount of Excess Finance Charge Collections with respect to all the Excess Allocation Series for such Distribution Date and (y) a fraction, the numerator of which is the Finance Charge Shortfall for Series 2018-6 for such Distribution Date and the denominator of which is the aggregate amount of Finance Charge Shortfalls for all the Excess Allocation Series for such Distribution Date. The “Finance Charge Shortfall” for Series 2018-6 for any Distribution Date will be equal to the excess, if any, of (a) the full amount required to be paid, without duplication, pursuant to subsections 4.05(a), 4.05(b) and 4.05(c) and subsections 4.07(a) through (j) on such Distribution Date and the full amount required to be paid, without duplication, pursuant to subsections 3.02(a)(iii) and 3.02(a)(iv) of the Transfer Agreement on the related Payment Date (as such term is defined in the Transfer Agreement) over (b) the sum of (i) the Reallocated Investor Finance Charge Collections, (ii) if such Monthly Period relates to a Distribution Date with respect to the Controlled Accumulation Period or Early Amortization Period, the amount of Principal Funding Account Investment Proceeds, if any, with respect to such Distribution Date and (iii) the amount of funds, if any, to be withdrawn from the Reserve Account which, pursuant to subsection 4.12(d), are required to be included in Class A Available Funds with respect to such Distribution Date. The amount of Excess Finance Charge Collections for Series 2018-6 for any Distribution Date shall be specified in subsection 3.02(a)(v) of the Transfer Agreement. On each Distribution Date, the Trustee shall deposit into the Collection Account for application in accordance with Section 4.05 of the Agreement the aggregate amount of Excess Finance Charge Collections received by the Trustee pursuant to the Transfer Agreement on such date.

  • Data Collection, Processing and Usage The Company collects, processes and uses the International Participant’s personal data, including the International Participant’s name, home address, email address, and telephone number, date of birth, social insurance number or other identification number, salary, citizenship, job title, any shares of Common Stock or directorships held in the Company, and details of all Equity Awards or any other equity compensation awards granted, canceled, exercised, vested, or outstanding in the International Participant’s favor, which the Company receives from the International Participant or the Employer. In granting the Equity Award under the Plan, the Company will collect the International Participant’s personal data for purposes of allocating shares of Common Stock and implementing, administering and managing the Plan. The Company’s legal basis for the collection, processing and usage of the International Participant’s personal data is the International Participant’s consent.

  • Deposit of Collections The Borrower shall promptly (but in no event later than two Business Days after receipt) deposit or cause to be deposited into the Collection Account any and all Available Collections received by the Borrower, the Servicer or any of their Affiliates.

Time is Money Join Law Insider Premium to draft better contracts faster.