Experiments Sample Clauses

Experiments. In this section, we evaluate our approach in two tasks: phrase alignment (Section 4.1) and machine translation (Section 4.2). , | ×
AutoNDA by SimpleDocs
Experiments. 6.1. Experiments will not be verified or approved by us. We shall not be responsible for any Experiment, any content contained within any Experiment, and/or the results of and/or conclusions drawn from any Experiment.
Experiments. We conducted experiments on two challenging translation tasks: Japanese-to-English (JP-EN) and Chinese-to-English (CH-EN), using case-insensitive BLEU for evaluation. For the JP-EN task, we use the data from NTCIR- 9 (Goto et al., 2011): the training data consisted of 2.0M sentence pairs, The development and test sets contained 2K sentences with a single referece, respectively. For the CH-EN task, we used the data from the NIST2008 Open Machine Translation Campaign: the training data consisted of 1.8M sen- tence pairs, the development set was nist02 (878 sen- tences), and the test sets are were nist05 (1082 sen- tences), nist06 (1664 sentences) and nist08 (1357 sentences). Four baselines were used. The first two were the conventional state-of-the-art translation systems, phrase-based and hierarchical phrase-based systems, which are from the latest version of well-known Moses (Xxxxx et al., 2007) and are respectively de- noted as Moses and Xxxxx-xxxx. The other two were Systems Prefix Suffix NMT-l2r 29.4 25.4 NMT-r2l 26.2 26.7 NMT-J 29.5 28.6 Table 1: Quality of 5-word prefixes and suffices of translations in the JP-EN test set, evaluated using partial BLEU. (NMT-J) was also implemented using NMT (Bah- danau et al., 2014). We followed the standard pipeline to train and run Moses. GIZA++ (Och and Ney, 2000) with grow-diag-final-and was used to build the translation model. We trained 5-gram target language models using the training set for JP-EN and the Gigaword corpus for CH-EN, and used a lexicalized distortion model. All experiments were run with the default settings except for a distortion-limit of 12 in the JP- EN experiment, as suggested by (Goto et al., 2013).5 To alleviate the negative effects of randomness, the final reported results are averaged over five runs of MERT. To ensure a fair comparison, we employed the same settings for all NMT systems. Specifically, except for the maximum sequence length (seqlen, which was to 80), and the stopping iteration which was selected using development data, we used the default settings set out in (Bahdanau et al., 2014) for all NMT-based systems: the dimension of word em- bedding was 620, the dimension of hidden units was 1000, the batch size was 80, the source and target side vocabulary sizes were 30000, and the beam size for decoding was 12. Training was conducted on a single Tesla K80 GPU, and it took about 6 days to train a single NMT system on our large-scale data.
Experiments. To perform a thorough analysis of designed techniques, the evaluation is performed on the answer sentence selection and answer triggering tasks on both WikiQA and newly created corpus. Since the SelQA corpus provides an extensive metadata, a thorough error analysis on each system with respect to this corpus is also provided. Table 3.4 shows the distributions of the SelQA corpus. The dataset is split into training (70%), development (10%), and evaluation (20%) sets. Model Development Evaluation MAP MRR MAP MRR CNN0: baseline 69.93 70.66 65.62 66.46 CNN1: avg + word 70.75 71.46 67.40 69.30 CNN2: avg + emb 69.22 70.18 68.78 70.82 Xxxx et al. [161] - - 65.20 66.52 Xxxxxx et al. [46] - - 68.86 69.57 Xxxx et al. [88] - - 68.86 70.69 Xxx et al. [169] - - 69.21 71.08 Xxxx et al. [152] - - 70.58 72.26 Xxxx et al. [126] - - 71.07 73.04 Xxxx et al. [144] - - 73.41 74.18 Table 3.5: The answer sentence selection results on the development and evaluation sets of WikiQA. The answer triggering dataset is significantly larger than the answer sentence selection one, due to the extra sections added by Task 5 (Section 3.1.2). Answer Sentence Selection First, the results on answer sentence selection are presented. Table 3.5 shows the results from the previous approaches against the approaches designed in this thesis on the WikiQA dataset. Two metrics are used, mean average precision (MAP) and mean reciprocal rank (MRR), for the evaluation of this task. CNN0 is the replication of the best model in [161]. CNN1 and CNN2 Model Development Evaluation MAP MRR MAP MRR CNN0: baseline 84.62 85.65 83.20 84.20 CNN1: avg + word 85.04 86.17 84.00 84.94 CNN2: avg + emb 85.70 86.67 84.66 85.68 Xxxxxx et al. [47] - - 87.58 88.12 Xxxx et al. [126] - - 89.14 89.93 Table 3.6: The answer sentence selection results on SelQA. are the CNN models using the subtree matching mechanism in Section 3.2.2, where the comparator of fc is either the word form or the word embedding respectively, and fm = avg. The average function is used considering the fact that in the answer sentence selection configuration, there exist at least a single sentence is an answer or supports given the question. Therefore, the given context (a set of candidate sentences) is contextually quite consistent with the question. The experimentation setup shows that the models that use subtree matching method consistently outperform the baseline model. Note that among the three metrics of fm, avg, sum, and max, avg outperformed the other...
Experiments. We performed a set of experiments to test different properties of GAM. First, we tested the generality of GAM by applying our approach to Multilayer Perceptrons (MLP), Convolutional Neural Networks (CNN), Graph Convolution Networks (GCN) [15], and Graph Attention Networks (GAT) [35]2. Next, we tested the robustness of GAM when faced with noisy graphs, as well as evaluated GAM and GAM* with and without a provided graph, comparing them with the state-of-the-art methods.
Experiments. In order to fulfill the above purposes, the Joint Trade Board shall have the authority to experiment with revisions to work rules, use of tools, spray and overtime Section of this Agreement in order to recapture repaint work and work being done non-union and to obtain jurisdiction of new products and processes in our industry. All experiments shall be carefully documented and results made available to the ASSOCIATION and the UNION.
Experiments. We have applied our method to several original data sets (coming from factored numbers) and show that this gives good results. We have carried out two types of experiments. First we assumed that the complete data set is given and we wanted to know if the simulation gave the same oversquareness when simulating the same number of relations as contained in the original data set. As input for the simulation we used
AutoNDA by SimpleDocs
Experiments. ‌ Our experimental setup is once again as described in Chapter 2. For this model, in addition to measuring parsing F1, we also measure how well the word alignments match gold-standard annotations according to AER and F1. In our syntactic MT experiments, we investigate how the two relevant components of the model (English parse trees and word alignments) affect MT performance individually and in tandem.
Experiments. In this section, we explore the behavior of the proposed negotiation model in different scenar- ios. The proposed framework has been imple- mented in genius (Xxx et al., 2012), a simulation framework for automated negotiation that allows researchers to test their frameworks and xxxxxx- xxxx against state-of-the-art agents designed by other researchers. Recently, genius has become a widespread tool that increases its repository of ne- gotiating agents with the annual negotiation com- petition (Baarslag et al., 2012). In order to assess the performance of the pro- posed negotiation approach, we have performed dif- ferent experiments. All of the experiments have been carried out in the negotiation domain (or case study) introduced in Section 2.4. The first exper- iment (Section 6.1) studies the performance of the proposed model when facing single opponent agent. The comparison is carried out in scenarios with dif- ferent degrees of team’s preference dissimilarity. In the second experiment, we study the performance of our negotiation team model when facing another negotiation team in bilateral negotiations. In the third experiment (Section 6.3) we study how the Bayesian weights wA and wop, which control the importance given to the preferences of the team and the opponent in the unpredictable partial offer pro- posed to teammates, impact the performance of the proposed model when team members employ the Bayesian strategy. Finally, we conduct an experi- ment to study the effect of team members’ xxxxx- vation utility on the performance of the proposed negotiation model (Section 6.4).
Experiments. In a series of experiments, we have evaluated how well the proposed approach is in determining the cause of a problem. Here, knowing the components that are broken,
Time is Money Join Law Insider Premium to draft better contracts faster.