Multi-loss Joint Learning Clause Samples

Multi-loss Joint Learning. In order to confirm that all modules work harmoniously and allow the proposed neu- ral network to be trained in an end-to-end manner, we sum the three loss functions to form the final loss, which is written as follows: Lfinal = LIC + β1LCCCDTL + β2LDC (5.5) where β1 and β2 are hyper parameters to balance the importance of the three terms. Through cross validation, we set β1, β2 and λ to 1, 0.1 and 1 respectively. In addition, we adopt the stochastic gradient descent (SGD) method to update the parameters of the network while different learning rates are applied on different layers. More specifically, the weights of the pre-trained primary feature extractor should not be updated as fast as the other modules because we should keep the useful information acquired by training on ImageNet. Hence, we set the learning rate for the backbone network to a relatively smalll value, more specifically to 10−4. For the other modules, IC, CCCDTL, DC and RTDA, the learning rate are 10−1, 10−1, 10−1 and 10−2 respectively.