Publikation
Dynamic Cost Volumes with Scalable Transformer Architecture for Optical Flow
Vemburaj Yadav; Alain Pagani; Didier Stricker
In: Irish Pattern Recognition and Classification Society. Irish Machine Vision and Image Processing Conference (IMVIP-2023), August 30 - September 1, Galway, Ireland, zenodo, 2023.
Zusammenfassung
We introduce DCV-Net, a scalable transformer-based architecture for optical flow with dynamic cost
volumes. Recently, FlowFormer [Huang et al., 2022], which applies transformers on the full 4D cost vol-
umes instead of the visual feature maps, has shown significant improvements in the flow estimation accuracy.
The major drawback of FlowFormer is its scalability for high-resolution input images, since the the com-
plexity of the attention mechanism on the 4D cost volumes scales to O(N^4 ) , with N being the number of
visual feature tokens. We propose a novel architecture where we obtain the FlowFormer type enrichment
of matching cost representations, but using light-weight attention on the visual feature maps with quadratic
( O(N^2 ) ) complexity. Firstly, we generate sequential updates to the visual feature representations and, con-
sequently, the cost volumes using lightweight attention layers. Secondly, we interleave this sequence of cost
volumes with iterations of flow refinement, thereby modeling the update operator in our refinement stage to
handle dynamic cost volumes. Our architecture, with two orders of computational complexity lower than
that of FlowFormer, demonstrates strong cross-domain generalization on the Sintel and KITTI datasets. We
outperform FlowFormer on the KITTI dataset and achieve highly competitive flow estimation accuracies on
the Sintel dataset.