i have no idea. nostr:npub1m3xdppkd0njmrqe2ma8a6ys39zvgp5k8u22mev8xsnqp4nh80srqhqa5sf do you know?
Discussion
I think its that there's less a concept of 'object' in an image, its correlations from one pixel to the next. Because fingers and other objects can be in so many positions and in various contexts its difficult for the model to 'segment' out that object like a torso or facial features which have more stability across images.
That makes sense, thanks