After one day of writing a preprocessor and salvaging code from previous linear model, the transformer is finally training.
ETA 40minutes

After one day of writing a preprocessor and salvaging code from previous linear model, the transformer is finally training.
ETA 40minutes

No replies yet.