(Paper Reading)Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

it2022-05-05  161

Introduction

Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. In this paper we propose a combined bottom-up and top-down visual attention mechanism. The bottom-up mechanism proposes a set of salient image regions, with each region represented by a pooled convolutional feature vec- tor. Practically, we implement bottom-up attention using Faster R-CNN [33], which represents a natural expression of a bottom-up attention mechanism. 这篇论文其实

Method

Conclusion

Reference

Author slide


最新回复(0)