Sentence Tokenization Sample Clauses
Sentence Tokenization. The Firefox browser does not have any knowledge of a language’s syntax. So, in particular, it does not know how to tokenize text into sentences. Thus, when presented with text of the form It was a pleasure to burn. It was a special pleasure to see things eaten, to see things blackened and changed. it does not know how to tokenize this into two sentences. So when sending this text to the server it sends the sentences together as one uninterpreted sequence of characters. The NMT engine, however, performs best when it is translating once sentence at a time, not a sequence of sentences. So to optimize translation quality we had to find a means of delivering single sentences to the NMT engine. The solution we came up with was introducing a server-side sentence tokenizer that tokenized the uninterpreted sequence of characters passed from the client into sentences. The solution keeps the client, as it should be, ignorant of a language’s syntax and concentrates any knowledge of a language’s syntax, as it should be, with the NMT engine.
