vortikiss.blogg.se - Posterazor insufficient data for an image

The performances got higher for every dataset when our proposed augmentation method was used as test time augmentation. The top four lines of Table 2 prove that Décalcomanie can be utilized as test time augmentation. During training time, we only used the original frame as input and used various frames created via Décalcomanie such as OLR, OL, and OR during test time. First, we tested if Décalcomanie augmentation can be used as test time augmentation. The first line of the table means the result when we did not apply Décalcomanie augmentation. The results of the shared backbone and multiple loss on SMIC, SAMM, and CASME II using 3D-ResNeXt-101 from scratch are shown in Table 2. As a result, we set the video lengths of SMIC, SAMM, and CASME2 to 34, 74, and 66, respectively. We used the linear interpolation to set the number of frames to their average number. Furthermore, because the number of frames in each video sample of the micro-expression datasets is different, it was necessary to fix the datasets’ frames to capture the temporal information. Because synthetic samples have mixed faces, cutting them in half and combining them can cause noisy representations. We used synthetic samples generated using extended SMOTE with the N-step pre-training experiments, but not in the Décalcomanie experiments. In the case of OLR frames, we set λ O, λ L, and λ R to 0.4, 0.4, and 0.2, respectively. This is different in that the learning rate is decayed by 10 at the 30th, 60th, and 80th epochs however, all other hyperparameters are the same as in the pre-training experiment. In the case of Décalcomanie experiments, we trained 3D-ResNeXt-101 for 100 epochs. For N-step pre-training experiments, we trained 3D-ResNet-50 for 30 epochs using Adam with β 1 = 0.9, β 2 = 0.999, a batch size of 30, and a learning rate of 0.0001, which decayed by 10 at the 13th, 18th, and 22nd epoch. We used one NVIDIA RTX A6000 48 GB GPU per experiment.

The results show that the proposed methods can successfully overcome the data shortage problem and achieve high performance.

Second, we propose Décalcomanie data augmentation, which is based on facial symmetry, to create a composite image by cutting and pasting both faces around their center lines. The first method involves N-step pre-training, which performs multiple transfer learning from action recognition datasets to those in the facial domain. Because training models with insufficient data may lead to decreased performance, this study proposes two ways to solve the problem of insufficient data for micro-expression training. However, it is a challenging process, and the number of samples collected tends to be less than those of macro-expressions. To derive micro-expressions, participants are asked to suppress their emotions as much as possible while watching emotion-inducing videos. Micro-expressions are low-intensity emotions presented for a short moment of about 0.25 s, whereas macro-expressions last up to 4 s. Facial expressions are divided into micro- and macro-expressions.