Segmentation Approach Sample Clauses

Segmentation Approach. In the EUMSSI project, the ultimate goal is not a perfect segmentation for human consumption, but a reasonable input for text analysis components. We thus first considered a simpler approach, based on the training of a monolingual statistical machine translation (SMT) system translating from automatic speech recognition (ASR) output to a text which includes punctuation and capitalization. The segmentation can then be performed based on the strong punctuation marks. In this task, the information of pauses in the ASR output is very valuable (see example below). We thus looked for manual transcripts including pauses to build a parallel corpus to train the SMT system. In one side of the parallel corpus, we removed pause information to simulate an edited transcript, and on the other side we simulated the ASR output by removing punctuation, case information and special characters. Example of automatic audio transcript: another product made in germany being packed for export this crate will travel by truck and then by ship to north africa exports represent a huge part of germany 's economy many engineering companies make most of their sales to foreign customers and that 's sad to increase even further next year (pause) the german ▇▇▇▇▇▇▇▇ of commerce and industry forecasts germany well export about one point four five trillion euros worth of goods in twenty fourteen (pause) that would be over four percent more than germany 's twenty thirteen exports (pause) a big reason for the expected growth is the gradual recovery of eurozone economies (pause) that means new investments and new orders for machinery from germany (pause) while that 's good news for german companies many foreign countries grumble that germany isn 't returning the favor by spending more money overseas (pause)