(Paper Reading)Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

it2022-05-05 238

Introduction

Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. In this paper we propose a combined bottom-up and top-down visual attention mechanism. The bottom-up mechanism proposes a set of salient image regions, with each region represented by a pooled convolutional feature vec- tor. Practically, we implement bottom-up attention using Faster R-CNN [33], which represents a natural expression of a bottom-up attention mechanism. 这篇论文其实

(Paper Reading)Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Introduction

Method

Conclusion

Reference

专利