Proceedings of The 7th International Conference on Knowledge and Innovation in Engineering, Science and Technology
Refined Granularity Extraction for Person Reidentification
Monocular 3D human pose estimation in the wild is still a challenging task due to the scarcity of annotated yet unconstrained training data for accurate 3D poses. In this paper, we tackle this issue by proposing a weakly-supervised approach that learns to estimate 3D poses from unlabeled multi-view generated data from a single RGB image without relying on 3D annotations. Since the generation of multi-view data from a single image is prone to degenerated solutions, we utilize a GAN based approach to create multi-view pose representations that are authentic. The added constraints on the latent distribution simplify the learning of a shared latent space between the depth map and the pose. It also improves the approach generalization and exploitation of unlabeled depth maps. We evaluate our approach on three challenging datasets (Human3.6M, MPII-INF-3DHP and Leeds SportsPose) where it achieves state of the art performance among semi and weakly-supervised methods.
Keywords: Panoptic reconstruction, View generation, 3D reconstruction, inpainting, VAE.