HUMAN 3D POSE ESTIMATION BASED ON 2D KEYPOINTS
Article Sidebar
Main Article Content
Abstract
In the presented work, innovative low-sampling lightweight architecture is proposed to solve the task of 3D human pose estimation based on 2D key points. The approach introduces specialized trainable pose encodings designed for 3D pose estimation tasks, which are used in conjunction with traditional pose encodings to represent the input data. The architecture of the method includes multilevel feature processing and their adaptive association using a spatial attention mechanism, which allows to enhance relevant features. Experiments conducted on standard test datasets confirmed the effectiveness of the proposed method: a mean joint position error (MPJPE) value of 42.1 was achieved, which exceeds the results of existing approaches.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
O. V. NEDZVED, Belarusian State University, Minsk
канд. техн. наук, доц.
References
Pavllo, D., Feichtenhofer, C., Grangier, D., & Auli, M. (2019). 3D human pose estimation in video with temporal convolutions and semi-supervised training. CVPR. DOI: 10.48550/arXiv.1811.11742.
Zhang, T., Huang, B., & Wang, Y. (2020). Object-occluded human shape and pose estimation from a single-color image. CVPR. DOI: 10.1109/CVPR42600.2020.00740.
Zhao, L., Peng, X., Tian, Y., Kapadia, M., & Metaxas, D. N. (2019). Semantic Graph Convolutional Networks for 3D Human Pose Regression. CVPR. DOI: 10.48550/arXiv.1904.03345.
Pavlakos, G., Zhou, X., & Daniilidis, K. (2018). Ordinal depth supervision for 3D human pose estimation. CVPR. DOI: 10.48550/arXiv.1805.04095.
Artzi, Y., & Zettlemoyer, L. (2013). Weakly supervised learning of semantic parsers for mapping instructions to actions. Trans. Assoc. Comput. Linguist., (1), 49–62. DOI: 10.1162/tacl_a_00209.
Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M. J., & Gehler, P. V. (2017). Unite the people: Closing the loop between 3D and 2D human representations. CVPR. DOI: 10.48550/arXiv.1701.02468.
Li, K., Jiao, N., Liu, Y., Wang, Y., & Yang, J. (2018). Shape and pose estimation for closely interacting persons using multi-view images. Computer Graphics Forum, 37(7), 361–371. DOI: 10.1111/cgf.13574.
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., … Zieba, K. (2016). End-to-end learning for self-driving cars. CVPR. DOI: 10.48550/arXiv.1604.07316.
Ye, L., Rochan, M., Liu, Z., & Wang, Y. (2019). Cross-modal self-attention network for referring image segmentation. CVPR. DOI: 10.48550/arXiv.1904.04745.
Yeh, R. A., Hu, Y.-T., & Schwing, A. G. (2019). Chirality nets for human pose regression. CVPR. DOI: 10.48550/arXiv.1911.00029.
Zeng, A., Sun, X., Huang, F., Liu, M., Xu, Q., & Lin, S. (2020). SRNet: Improving generalization in 3D human pose estimation with a split-and-recombine approach. ECCV. DOI: 10.48550/arXiv.2007.09389.
Liang, J., & Lin, M. C. (2019). Shape-aware human pose and shape reconstruction using multi-view images. ICCV. DOI: 10.48550/arXiv.1908.09464.