Evaluation Metrics. Evaluation is important for all NLP techniques, to assess to what extent the method is working. As in this project we are mainly dealing with the evaluation of NER, we will discuss the different evaluation metrics relevant to this technique Prediction Label True tp fn False fp tn and give examples within this context. Most metrics involve calculations of per- centages between correctly and incorrectly classified items. In the case of NER, we predict a label for each token. That predicted label is compared to the true label, and we can then put each prediction in one of the following categories: True positive (tp). When a token is part of an entity, and the predicted label is the correct entity. True negative (tn). When a token is not part of an entity, and the predicted label is also not part of an entity. False negative (fn). When a token is part of an entity, but the predicted label is not part of an entity. More simply put: an entity that has not been recognised by the system. False positive (fp). When a token is not part of an entity, but the predicted label is an entity. More simply put: the system recognises an entity where there is none. These categories are further illustrated in table 2.1. Once we have this in- formation, we can calculate some metrics. The most used measures in machine learning in general are recall, precision and F1 score, and these are almost always used to evaluate NER too. Recall is a measure that indicates out of all the entities in a text, what per- centage have been correctly labelled as an entity. It can also be viewed as the percentage of entities that have been found. It is defined as follows: + Precision is a measure that indicates, out of all the labelled entities, what percentage has been assigned the correct label. In essence, this means that it shows that when an algorithm predicts an entity, how often it is right. It is defined as follows: + Precision = tp

Appears in 1 contract

Sources: License Agreement

Evaluation Metrics. Evaluation is important for all NLP techniques, to assess to what extent the method is working. As in this project we are mainly dealing with the evaluation of NER, we will discuss the different evaluation metrics relevant to this technique 28 CHAPTER 2. BACKGROUND Prediction Label True tp fn False fp tn and give examples within this context. Most metrics involve calculations of per- centages between correctly and incorrectly classified items. In the case of NER, we predict a label for each token. That predicted label is compared to the true label, and we can then put each prediction in one of the following categories: True positive (tp). When a token is part of an entity, and the predicted label is the correct entity. True negative (tn). When a token is not part of an entity, and the predicted label is also not part of an entity. False negative (fn). When a token is part of an entity, but the predicted label is not part of an entity. More simply put: an entity that has not been recognised by the system. False positive (fp). When a token is not part of an entity, but the predicted label is an entity. More simply put: the system recognises an entity where there is none. These categories are further illustrated in table 2.1. Once we have this in- formation, we can calculate some metrics. The most used measures in machine learning in general are recall, precision and F1 score, and these are almost always used to evaluate NER too. Recall is a measure that indicates out of all the entities in a text, what per- centage have been correctly labelled as an entity. It can also be viewed as the percentage of entities that have been found. It is defined as follows: + Precision is a measure that indicates, out of all the labelled entities, what percentage has been assigned the correct label. In essence, this means that it shows that when an algorithm predicts an entity, how often it is right. It is defined as follows~~: + Precision = tp~~:

Appears in 1 contract

Sources: Doctoral Thesis

Common use of Evaluation Metrics Clause in Contracts