The training strategy implemented, characterized by optimization using Adam with an initial learning rate of 0.0001 complemented by a Cosine Annealing scheduler that dynamically adjusts the learning ...