# UnsupervisedGeometryAwareRepresentationLearning **Repository Path**: yangyize/UnsupervisedGeometryAwareRepresentationLearning ## Basic Information - **Project Name**: UnsupervisedGeometryAwareRepresentationLearning - **Description**: UnsupervisedGeometryAwareRepresentationLearning - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-08-30 - **Last Updated**: 2022-08-31 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README Unsupervised Geometry-Aware Representation Learning for 3D Human Pose Estimation =================== ECCV paper by Helge Rhodin, Mathieu Salzmann, and Pascal Fua https://arxiv.org/abs/1804.01110 Please cite the paper in your publications if it helps your research: @inproceedings{rhodin2018unsupervised, author = {Rhodin, Helge and Salzmann, Mathieu and Fua, Pascal}, booktitle = {ECCV}, title = {Unsupervised Geometry-Aware Representation Learning for 3D Human Pose Estimation}, year = {2018} } **Version 2.0 available** on a separate github repo: [NSD: Neural Scene Decomposition](https://github.com/hrhodin/NeuralSceneDecomposition). This new CVPR19 paper extends the ECCV18 method to work with full-frame input and multiple persons. It decomposes the image into foreground instances and background. Furthermore, it infers occlusion and depth through differentiable rendering. Features =================== Modern 3D human pose estimation techniques rely on deep networks, which require large amounts of training data. In this work, we propose to overcome this problem by learning a geometry-aware body representation from multi-view images without 3D annotations. To this end, we use an encoder-decoder that predicts an image from one viewpoint given an image from another viewpoint. Because this representation encodes 3D geometry, using it in a semi-supervised setting makes it easier to learn a mapping from it to 3D human pose. As evidenced by our experiments, our approach significantly outperforms fully-supervised methods given the same amount of labeled data, and improves over other semi-supervised methods while using as little as 1% of the labeled data. The provided pytorch implementation provides ------------------- * Network definition and weights (image encoder, image decoder and pose decoder) * Interactive test code * Training code (requires the H36M dataset) Minimal Dependencies =================== For testing a pre-trained model only the following packages are required: * Pytorch 0.4 (lower versions might work as well) and torchvision * numpy * matplotlib * pickle * imageio Moreover you will need an X Windows System (e.g.,XQuartz for mac) to run the interactive demo. Test the pretrained model ======================= A pre-trained model can then be tested with ``` python configs/test_encodeDecode.py ``` It outputs synthesized views and 3D pose estimates with matplotlib. Note that this requires an X Window System when exectued on a remote server, e.g., ssh -Y name@server.com. Different view angles can be explored interactively through slider input. It should look like this: ![NVS and pose viewer image](./examples/example.png "NVS and pose viewer") Training Dependencies ====================== Training your own model requires more dependencies: * Ignite (provided in subdirectory) * Visdom (optional, for graphical display of training progress, https://github.com/facebookresearch/visdom) * **H3.6M dataset** and dataloader (I provide my own dataloader for reference, but it is based on some preprocessed version of Human3.6Million which I can't share due to the original license.) Self-supervised Representation Learning ======================================= After downloading and file extraction, you should be able to start training by executing the following scrip from within the code root folder. ``` python configs/train_encodeDecode.py ``` There is quite a bit of debug output. Feel free to remove some if you feel like. It will create an "output/encode_resL3_ResNet_layers4...." folder to monitor the progress (in case you don't use visdom). Every 5k frames it will evaluate on the test set. This and other settings can be changed in configs/config_dict_encodeDecode.py Supervised 3D Pose Training =========================== In the file 'config_train_encodeDecode_pose.py' you have to set the 'network_path' to the output folder of the training through running 'python configs/config_train_encodeDecode.py' To subsequently run the pose estimation, you simply run ``` python configs/train_encodeDecode_pose.py ``` This second training stage will only train the pose estimation decoder. It keeps the encoder fixed, hence, you first need to train the encoder for a while, 400k iterations are good. Test your model ======================= As before, you have to set the 'network_path' in configs/config_test_encodeDecode.py. The trained model can then be tested as before with ``` python configs/test_encodeDecode.py ``` You might want to change the test set in configs/test_encodeDecode.py to your own dataset.