this network learned 32 bit numbers
outerdim = 32
hiddendim = 3072
save_path='model_weights.pt'
numlayers = 2
batch_size = 100
it seems when we double the outerdim, hidden dim needs to be multiplied with quite bit larger number, at least when 2 layers are used. this scalability may not suffice with 256 bit numbers, but its possible adding more layers improves scalability.