So it’s going to be predicting words within a particular context window.
You could measure how similar the embeddings for a particular prediction are to the embeddings of the ground truth word. Like an autoencoder.
You would use something like MSE for the loss function.