De-duplicator Sample Clauses

De-duplicator. The Web contains many duplicate (parts of) pages. For instance, Xxxxxx et al. (2009) reported that during building of the Wacky corpora the amount of documents was reduced by more than 50% after de-duplication. Ignoring this phenomenon and including duplicate documents could have a negative effect in creating a representative corpus. Therefore, the De-duplicator examines the main content of the stored documents in order to detect and remove near- duplicates. This module employs the de-duplication strategy12 included in the Nutch framework, which involves the construction of a text profile based on quantized word frequencies, and an MD5 hash for each page (see section 3.2). An additional step has been integrated into the final version of FMC for detection and removal of (near) duplicates. Each document is represented as a list with size equal to the number of paragraphs (without crawlinfo attribute) of the document. The elements of the list are the MD5 hashes of the paragraphs. Then, each list is checked against all other lists. For each candidate pair, the intersection of the lists is calculated. If the ratio of the intersection cardinality with the cardinality of the shortest list is over a predefined threshold, the documents are considered near- duplicates and the shortest is discarded.
AutoNDA by SimpleDocs
De-duplicator. The De-duplicator module described in 2.1.9 is also available as a standalone web service accessible from xxxx://xxx.xxxx.xx/soaplab2-axis/#ilsp.ilsp_deduplicatormd5_row. The service has two mandatory parameters:

Related to De-duplicator

  • Non-Duplication In the event that the Executive shall perform services for the Bank or any other direct or indirect subsidiary or affiliate of the Company or the Bank, any compensation or benefits provided to the Executive by such other employer shall be applied to offset the obligations of the Company hereunder, it being intended that this Agreement set forth the aggregate compensation and benefits payable to the Executive for all services to the Company, the Bank and all of their respective direct or indirect subsidiaries and affiliates.

  • No Duplication The remedies provided in this Article 8 shall not be duplicative of any remedy available under the indemnification provisions of the Purchase Agreement.

  • No Duplicate Payment Grantee may use other funds in addition to the Grant Funds to complete the Project; provided, however, the Grantee may not credit or pay any Grant Funds for Project costs that are paid for with other funds and would result in duplicate funding.

  • REPORT OF CONTRACT USAGE All fields of information shall be accurate and complete. The report is to be submitted electronically via electronic mail utilizing the template provided in Microsoft Excel 2003, or newer (or as otherwise directed by OGS), to the attention of the individual shown on the front page of the Contract Award Notification and shall reference the Group Number, Award Number, Contract Number, Sales Period, and Contractor's (or other authorized agent) Name, and all other fields required. OGS reserves the right to amend the report template without acquiring the approval of the Office of the State Comptroller or the Attorney General.

  • REPAIRED OR REPLACED PARTS / COMPONENTS Where the Contractor is required to repair, replace or substitute Product or parts or components of the Product under the Contract, the repaired, replaced or substituted Products shall be subject to all terms and conditions for new parts and components set forth in the Contract including Warranties, as set forth in the Additional Warranties Clause herein. Replaced or repaired Product or parts and components of such Product shall be new and shall, if available, be replaced by the original manufacturer’s component or part. Remanufactured parts or components meeting new Product standards may be permitted by the Commissioner or Authorized User. Before installation, all proposed substitutes for the original manufacturer’s installed parts or components must be approved by the Authorized User. The part or component shall be equal to or of better quality than the original part or component being replaced.

  • Net Metering If you generate electricity from a renewable generating facility to offset your electricity consumption and/or use net metering at any time during the term of this Agreement, you must notify Starion.

  • Recover Copying Costs The Participating Institutions may impose a reasonable fee on the Authorized Users to cover costs of copying or printing portions of the Licensed Materials by or for the Authorized Users.

  • Meter Testing Company shall provide at least twenty-four (24) hours' notice to Seller prior to any test it may perform on the revenue meters or metering equipment. Seller shall have the right to have a representative present during each such test. Seller may request, and Company shall perform, if requested, tests in addition to the every fifth-year test and Seller shall pay the cost of such tests. Company may, in its sole discretion, perform tests in addition to the fifth year test and Company shall pay the cost of such tests. If any of the revenue meters or metering equipment is found to be inaccurate at any time, as determined by testing in accordance with this Section 10.2 (Meter Testing), Company shall promptly cause such equipment to be made accurate, and the period of inaccuracy, as well as an estimate for correct meter readings, shall be determined in accordance with Section 10.3 (Corrections).

  • CAISO Monthly Billed Fuel Cost [for Geysers Main only] The CAISO Monthly Billed Fuel Cost is given by Equation C2-1. CAISO Monthly Billed Fuel Cost Equation C2-1 = Billable MWh ◆ Steam Price ($/MWh) Where: • Steam Price is $16.34/MWh. • For purposes of Equation C2-1, Billable MWh is all Billable MWh Delivered after cumulative Hourly Metered Total Net Generation during the Contract Year from all Units exceeds the Minimum Annual Generation given by Equation C2-2. Equation C2-2 Minimum Annual Generation = (Annual Average Field Capacity ◆ 8760 hours ◆ 0.4) - (A+B+C) Where: • Annual Average Field Capacity is the arithmetic average of the two Field Capacities in MW for each Contract Year, determined as described below. Field Capacity shall be determined for each six-month period from July 1 through December 31 of the preceding calendar year and January 1 through June 30 of the Contract Year. Field Capacity shall be the average of the five highest amounts of net generation (in MWh) simultaneously achieved by all Units during eight-hour periods within the six-month period. The capacity simultaneously achieved by all Units during each eight-hour period shall be the sum of Hourly Metered Total Net Generation for all Units during such eight-hour period, divided by eight hours. Such eight-hour periods shall not overlap or be counted more than once but may be consecutive. Within 30 days after the end of each six-month period, Owner shall provide CAISO and the Responsible Utility with its determination of Field Capacity, including all information necessary to validate that determination. • A is the amount of Energy that cannot be produced (as defined below) due to the curtailment of a Unit during a test of the Facility, a Unit or the steam field agreed to by CAISO and Owner. • B is the amount of Energy that cannot be produced (as defined below) due to the retirement of a Unit or due to a Unit’s Availability remaining at zero after a period of ten Months during which the Unit’s Availability has been zero. • C is the amount of Energy that cannot be produced (as defined below) because a Force Majeure Event reduces a Unit’s Availability to zero for at least thirty (30) days or because a Force Majeure Event reduces a Unit’s Availability for at least one hundred eighty (180) days to a level below the Unit Availability Limit immediately prior to the Force Majeure Event. • The amount of Energy that cannot be produced is the sum, for each Settlement Period during which the condition applicable to A, B or C above exists, of the difference between the Unit Availability Limit immediately prior to the condition and the Unit Availability Limit during the condition.

  • Monthly Invoices On or before the tenth (10th) day following the end of each calendar month, Seller shall deliver to PacifiCorp a proper invoice showing Seller's computation of Net Output delivered to the Point of Delivery during such month. When calculating the invoice, Seller shall provide computations showing the portion of Net Output that was delivered during On-Peak Hours and the portion of Net Output that was delivered during Off-Peak Hours. If such invoice is delivered by Seller to PacifiCorp, then PacifiCorp shall send to Seller, on or before the later of the twentieth (20th) day following receipt of such invoice or the thirtieth (30th) day following the end of each month, payment for Seller's deliveries of Net Output and associated Green Tags to PacifiCorp.

Time is Money Join Law Insider Premium to draft better contracts faster.