While ranking training seems like to work now (except timing info :( ), for 1000 retrieved events I have 5 likes (rbr.io is retrieving lots more events than other clients because it shows all parent comments for now).

I guess the solution will be to get liked events as well from the user's relays to get liked / non-liked training data ratio closer to 50% then reweight the liked events to have very small, but much more learnable weights. For example if 5 like events are exchanged to 100, the liked data should have 0.05 item weighting when training.

Of course a modified Hessian for log likelihood needs to be computed for Newton-Ralph method for finding max log likelihood, so if somebody is good at math, they are welcome to help adding item weights to the computation :)

Reply to this note

Please Login to reply.

Discussion

Looking at the Iterative reweighting computation more, I could hack in weighting at the y-mu step and just see how it performs, I wouldn't be surprised if that would be correct (equivalent to repeating rows in the x matrix and y vector):

Logistic regression iteration step

https://en.wikipedia.org/wiki/Logistic_regression#Algorithm

mu = sigmoid(x*w)

s = diag(mu * (1 - mu))

wnew = (x' * s * x)^-1 * x' * (s * x * w + y - mu)

Ranking really needs more liked / commented on events (positive examples) to be able to learn feature weights. Right now none of the features give good results with just 8 likes in the data set :(