After one day of writing a preprocessor and salvaging code from previous linear model, the transformer is finally training.

ETA 40minutes

Reply to this note

Please Login to reply.

Discussion

No replies yet.