Yes, you have an internal representation of the inputs in a continuous semantic space, but you can also have a final layer that outputs an approximation for the embeddings of the predicted word. And you can measure how similar that approximation is to the ground truth.

Alternatively you can also index all the possible outputs and represent them as integers, in which case you would be making descrete predictions. For this you could use sparse categorical cross entropy in your loss function.

Reply to this note

Please Login to reply.

Discussion

No replies yet.