2025, Vol. 7, Issue 1, Part D
Effective ımage recognition using adaptive vision transformers
Author(s): Muhammad D Hassan
Abstract: In recent times, vision transformers have shown remarkable performance abilities across a variety of jobs. These transformers are built on techniques that generate self-attention throughout the construction process. In spite of this, they need a significant amount of computational resources, which significantly increase in proportion to the number of patches, self-attention heads, and transformer blocks. The fact that they have achieved great success does not change the fact that they continue to struggle with this issue. Because of the large differences that are present in each picture, this study suggests that the need of simulating long-range interactions between patches differs from one image to the next. We present AdaViT, an adaptive computing to establish usage rules for which transformer blocks to utilize on a per-input basis throughout the backbone. This allows us to meet the goal that we have set for ourselves. The purpose of this technology is to enhance the efficiency of inference obtained from vision transformers while simultaneously minimizing the loss of accuracy that occurs during picture recognition. For the purpose of providing real-time judgments, a lightweight decision network that has been optimized alongside a transformer way has been integrated with the backbone. The results of extensive study on ImageNet have shown a significant efficiency gain that is more than thrice in comparison to the top vision transformers that are now available. Furthermore, there is a decline in accuracy that is exactly 0.75 percent at this point. The available computational budget determines whether or not this strategy is successful in achieving a sufficient balance between efficiency and accuracy. An exhaustive quantitative and qualitative study of learnt use strategies is carried out, which results in the acquisition of further insights into redundancy inside vision transformers.
DOI: 10.33545/26633582.2025.v7.i1d.190Pages: 264-269 | Views: 376 | Downloads: 176Download Full Article: Click Here
How to cite this article:
Muhammad D Hassan.
Effective ımage recognition using adaptive vision transformers. Int J Eng Comput Sci 2025;7(1):264-269. DOI:
10.33545/26633582.2025.v7.i1d.190