Parallel Corpora Sample Clauses

Parallel Corpora. ‌ Machine translation systems generally use two separate parallel corpora: at training time, a large corpus is used for extracting translation rules and collecting statistics (usually rule counts), and then at test time, the system is evaluated on a smaller held-out corpus. Systems that need to set parameters (including the one used in our experiments) also require an additional held-out tuning corpus, typically of about the same size as the test set. Our MT training corpus was a 22 million word mixed genre (though primarily newswire) corpus that had been previously assembled and processed to enforce a con- sistent tokenization scheme. The tuning and test sets were taken from the NIST MT04 and MT05 development sets, with some light processing to match the tokenization of the train- ing data. All these datasets were received from BBN as part of the DARPA XXXX (now BOLT) program. Details are in Table 2.2. Corpus Articles Sentences English Words Chinese Words Training 1-270 2261 69k 50k Development 301-325 223 5.0k 3.9k Test 271-300 265 7.6k 5.6k Table 2.3: Bilingual treebank corpus descriptions.
AutoNDA by SimpleDocs
Parallel Corpora. Req-SMT-001: Parallel corpora must be of sufficient translation quality SMT is able to work properly with parallel corpora, whereas comparable corpora are not adequate. Req-SMT-002: Parallel corpora must be of sufficient size Only sufficient amounts of parallel data can result into a good SMT system. Req-SMT-003: Parallel corpora must be accurately tokenised The tokens in the corpora must be identified and separated by blank spaces. 19 Domain codes are not too helpful. Some can be assigned on monolingual level, some on transfer level; however even in specific domains, the general-purpose readings of a term can occur, and vice versa. 20 cf. Thurmair 2006 Req-SMT-004: Parallel corpora must be aligned at sentence level All sentences in the source language of the parallel corpora must be aligned to their counterpart translations in the target language (and vice versa). Req-SMT-005: Parallel corpora must be in appropriate format The parallel corpus must consist of two files (one for the source language and one for the target language) each containing the same number of lines. Each line must contain one (or more) tokenized sentences so that the ith line in one file is aligned to the ith line in the other one.

Related to Parallel Corpora

  • Investment Management If and to the extent requested by the Advisor, the Sub-Advisor shall, subject to the supervision of the Advisor, manage all or a portion of the investments of the Portfolio in accordance with the investment objective, policies and limitations provided in the Portfolio's Prospectus or other governing instruments, as amended from time to time, the Investment Company Act of 1940 (the "1940 Act") and rules thereunder, as amended from time to time, and such other limitations as the Trust or Advisor may impose with respect to the Portfolio by notice to the Sub-Advisor. With respect to the portion of the investments of the Portfolio under its management, the Sub-Advisor is authorized to make investment decisions on behalf of the Portfolio with regard to any stock, bond, other security or investment instrument, and to place orders for the purchase and sale of such securities through such broker-dealers as the Sub-Advisor may select. The Sub-Advisor may also be authorized, but only to the extent such duties are delegated in writing by the Advisor, to provide additional investment management services to the Portfolio, including but not limited to services such as managing foreign currency investments, purchasing and selling or writing futures and options contracts, borrowing money or lending securities on behalf of the Portfolio. All investment management and any other activities of the Sub-Advisor shall at all times be subject to the control and direction of the Advisor and the Trust's Board of Trustees.

  • Health Care The Company will reimburse the Executive for the cost of maintaining continuing health coverage under COBRA for a period of no more than 12 months following the date of termination, less the amount the Executive is expected to pay as a regular employee premium for such coverage. Such reimbursements will cease if the Executive becomes eligible for similar coverage under another benefit plan.

  • Asset Management Supplier will: i) maintain an asset inventory of all media and equipment where Accenture Data is stored. Access to such media and equipment will be restricted to authorized Personnel; ii) classify Accenture Data so that it is properly identified and access to it is appropriately restricted; iii) maintain an acceptable use policy with restrictions on printing Accenture Data and procedures for appropriately disposing of printed materials that contain Accenture Data when such data is no longer needed under the Agreement; iv) maintain an appropriate approval process whereby Supplier’s approval is required prior to its Personnel storing Accenture Data on portable devices, remotely accessing Accenture Data, or processing such data outside of Supplier facilities. If remote access is approved, Personnel will use multi-factor authentication, which may include the use of smart cards with certificates, One Time Password (OTP) tokens, and biometrics.

  • AGREEMENT MANAGEMENT A. Contractor may change Project Manager but the Energy Commission reserves the right to approve any substitution of the Project Manager.

  • Document Management The Contractor must retain sufficient documentation to substantiate claims for payment under the Contract and all other records, electronic files, papers, and documents that were made in relation to this Contract. The Contractor must retain all documents related to the Contract for five (5) years after expiration of the Contract or, if longer, the period required by the General Records Schedules maintained by the Florida Department of State available at the Department of State’s Records Management website.

Time is Money Join Law Insider Premium to draft better contracts faster.