This paper proposes a tree-structured structure-from-motion (SfM) method that recovers 3D scene struc- tures and estimates camera poses from unordered image sets. Starting from atomic structures spanning the scene, we build well-connected structure groups, and propose RANSAC generalized Procrustes analy- sis (RGPA) to glue structures in the same group. The grouping-aligning operations hierarchically proceed until the full scene is reconstructed. Our work is the first attempt of using GPA for modern 3D recon- struction tasks. RGPA is able to merge multiple structures at a time and automatically identify outliers. The reconstruction tree is much more compact and balanced than previous hierarchical SfM methods and has a very shallow depth. These advantages, along with the resulting removal of intermediate bundle ad- justments, lead to significantly improved computational efficiency over state-of-the-art SfM methods. The cameras and 3D scene can be robustly recovered in the presence of moderate noise. We verify the effi- cacy of our method on a variety of datasets, and demonstrate that our method is able to produce metric reconstructions efficiently and robustly.