Synthesizing novel views of dynamic humans from stationary monocular cameras is a specialized but desirable setup. This is particularly attractive as it does not require static scenes, controlled environments, or specialized capture hardware. In contrast to techniques that exploit multi-view observations, the problem of modeling a dynamic scene from a single view is significantly more under-constrained and ill-posed. In this paper, we introduce Neural Motion Consensus Flow (MoCo-Flow), a representation that models dynamic humans in stationary monocular cameras using a 4D continuous time-variant function. We learn the proposed representation by optimizing for a dynamic scene that minimizes the total rendering error, over all the observed images. At the heart of our work lies a carefully designed optimization scheme, which includes a dedicated initialization step and is constrained by a motion consensus regularization on the estimated motion flow. We extensively evaluate MoCo-Flow on several datasets that contain human motions of varying complexity, and compare, both qualitatively and quantitatively, to several baselines and ablated variations of our methods, showing the efficacy and merits of the proposed approach. Pretrained model, code, and data will be released for research purposes upon paper acceptance.
- CCS Concepts
- • Computing methodologies → Shape modeling
All Science Journal Classification (ASJC) codes
- Computer Graphics and Computer-Aided Design