“EpipolarPose” - utilizes 2D poses from multi-view images using epipolar geometry to self-supervise a 3D pose estimator.
Datasets used: H36M, MPI-INF-3DHP.
Proposes a new performance measure: Pose Structure Score (PSS).
PCK And MPJPE treat each join independently. PSS is sensitive to structural errors in pose. Not a loss function.
First need to model the natural distribution of gt poses. Supervised Clustering method. if predicted cluster is same as the gt cluster PSS = 1, else 0.
Weakly supervised methods depend on unpaired 3D gt, a small subset of labels, or camera parameters in multiview settings.
Doesn’t require any 3D gt or camera extrinsics.
During training, EpipolarPose estimates 2D poses from multi-view images, utilises epipolar geometry to obtain a 3D pose and camera geometry which are used to train a 3D pose estimator.
SOTA among weakly/self methods.
Single view during inference. Multi-view, self-supervised during training (set of consecutive image pairs).
Networks pre-trained on MPII Human pose (MPI II). During training only the pose estimation network in the upper branch (3D) is trained, lower branch (2D) is frozen.
Without the 2D branch produced degenerate solutions where all keypoints collapse to a single location.
Volumetric heatmaps are obtained, for each heatmap we can obtain 2D pose and 3D pose by softargmax.
Use epipolar geometry and a cheirality check.
Inference has a refinement unit (RU) to map noisy inputs to more reliable 3D pose predictions.
Can we rely on the labels from multi view images?
Quality of estimated keypoints is crucial to attain better results.
If we have the ground truth 2D keypoints and camera geometry, triangulation gives 4.3 mm error and 99% PSS which is near perfect. Lack of camera geometry reduces the PMPJE and mPSS@50 by a small amount of 13 mm and 1%, respectively.
A pose detector trained on the 2D labels of H36M improves the MPII- pretrained one up to 17 mm and 5%.
Adding RU further improves the performance of our SS model by 20%.
Comparison with weakly/self supervised methods:
This gap (better MPJPE) indicates us that the 2D keypoint estimation quality is crucial for better performance.