Systematic Literature Review on the use of Artificial Intelligence for audio description of EAD images.

Authors

DOI:

https://doi.org/10.18264/eadf.v15i2.2593

Keywords:

Artificial intelligence, Distance education, Audio description

Abstract

​​With recent advances in digital technologies, incorporating artificial intelligence into distance education now presents itself as a much more viable and accessible possibility. The use of AI to prepare audio description of static images can be a potential accessibility instrument for EAD. The objective of this Systematic Literature Review - RSL - is to identify the state of the art on the use of artificial intelligence for audio description of static images. The execution of the RSL was supported with the Parsifal tool, and, through the specified protocol, 62 articles were found. The title, abstract and keywords were analyzed and, using the inclusion, exclusion and quality criteria, 22 studies were accepted. After fully reading the accepted studies, 86 algorithms and AI tools used were listed, in addition to 69 databases and 9 other tools useful in the development of instruments for describing images. These results point to potential possibilities for using ready-made instruments to develop a tool for audio description of static images for distance learning.  

Downloads

Download data is not yet available.

References

ALAM, M. Z. I.; ISLAM, S.; HOQUE, E. SeeChart: enabling accessible visualizations through interactive natural language interface for people with visual impairments. In: INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES (IUI), 28., 2023. Proceedings [...]. New York: ACM, 2023. p. 46–64. DOI: 10.1145/3581641.3584099.

ANGRAVE, L.; LI, J.; ZHONG, N. Creating TikToks, memes, accessible content, and books from engineering videos? First solve the scene detection problem. Grantee Submission, 2022.

CAMPOS, V. P.; GONÇALVES, L. M.; ARAÚJO, T. M. Applying audio description for context understanding of surveillance videos by people with visual impairments. In: IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, 14., 2017, Lecce. Proceedings [...]. IEEE, 2017. p. 1–5. DOI: 10.1109/AVSS.2017.8078530

Dittakan, K., Prompitak, K., Thungklang, P. et al. Image caption generation using transformer learning methods: a case study on Instagram image. Multimedia Tools and Applications, v. 83, p. 46397–46417, 2024. DOI: 10.1007/s11042-023-17275-9

DOS SANTOS, G. O.; COLOMBINI, E. L.; AVILA, S. #PraCegoVer: a large dataset for image captioning in Portuguese. Data, v. 7, n. 2, p. 13, 2022. DOI: 10.3390/data7020013

FERREIRA, L. et al. Presenter-centric image collection and annotation: enhancing accessibility for the visually impaired. In: SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES, 36., 2023, Rio Grande. Proceedings [...]. IEEE, 2023. p. 199–204. DOI: 10.1109/SIBGRAPI59091.2023.10347135

HAN, T. et al. AutoAD II: the sequel – who, when, and what in movie audio description. In: IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2023, Paris. Proceedings [...]. IEEE, 2023. p. 13599–13609. DOI: 10.1109/ICCV51070.2023.01255.

HAN, T. et al. AutoAD: movie description in context. In: CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, Vancouver. Proceedings [...]. IEEE/CVF, 2023. p. 18930–18940. DOI: 10.1109/CVPR52729.2023.01815.

KARAGÖZ, E.; KORUYAN, K. Design of audio description system using cloud based computer vision. Mehmet Akif Ersoy Üniversitesi Uygulamalı Bilimler Dergisi, v. 4, n. 1, p. 74–85, 2020. DOI: 10.31200/makuubd.651261.

KITCHENHAM, B. ; CHARTERS, S. Guidelines for performing Systematic Literature Reviews in Software Engineering (EBSE 2007-001). Keele University and Durham University Joint Report, 2007.

KRUSZIELSKI, L. F. et al. The Use of Artificial Intelligence Enabling Scalable Audio Description on Brazilian Television: A Workflow Proposal. SET INTERNATIONAL JOURNAL OF BROADCAST ENGINEERING, [S. l.], v. 9, n. 1, 2025. Disponível em: https://revistaeletronica.set.org.br/ijbe/article/view/271

LIMA, F. J. de; TAVARES, F. S. S. Subsídios para a construção de um código de conduta do áudio-descritor. Revista Brasileira de Tradução Visual (RBTV), 2010. Disponível em http://www.associadosdainclusao.com.br/enades2016/sites/all/themes/berry/documentos/07-subsidios-para-a-construcao-de-um-codigo-de-conduta.pdf - Acesso em setembro de 2016.

OION, M. S. R. et al. A machine learning based image to object detection and audio description for blind people. In: INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 26., 2023, Cox’s Bazar. Proceedings [...]. IEEE, 2023. p. 1–6. DOI: 10.1109/ICCIT60459.2023.10441089

ONCESCU, A.-M. et al. A sound approach: using large language models to generate audio descriptions for egocentric text-audio retrieval. In: IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2024, Seoul. Proceedings [...]. IEEE, 2024. p. 7300–7304. DOI: 10.1109/ICASSP48485.2024.10448486

PARISFAL. Review Planning. 2021. Disponível em: https://parsif.al/help/#planning - Acesso em: 15 maio 2025.

PERDIGÃO, L. T. Conteúdos audiovisuais acessíveis na educação a distância: a audiodescrição didática como pós-produção. 2023. 150 f. Tese (Doutorado em Ciências, Tecnologias e Inclusão) – Universidade Federal Fluminense, Niterói, 2023.

PERDIGÃO, L. T.; LIMA, N. R. W. Vendo com outros olhos: a audiodescrição na educação a distância. Niterói: [s. n.], 2017. 100 p. Il. Ebook. Disponível em: https://educapes.capes.gov.br/bitstream/capes/429946/4/GuiaPDFfinal_2020.pdf - Acesso em: 15 maio 2025.

PERDIGÃO, L. T.; MONTEIRO, F. V.; FERNANDES, E. M. Audiodescrição como texto alternativo nas principais redes sociais digitais. In: CONGRESSO INTERNACIONAL INTERDISCIPLINAR EM SOCIAIS E HUMANIDADES, 10., 2021, Niterói. Anais [...]. Niterói: Programa de Pós-Graduação, 2021. Disponível em: https://www.even3.com.br/anais/xc22021/435829-AUDIODESCRICAO-COMO-TEXTO-ALTERNATIVO-NAS-PRINCIPAIS-REDES-SOCIAIS-DIGITAIS - Acesso em: 21 out. 2024.

PERDIGÃO, L. et al. Inteligência artificial para audiodescrição de imagens: uma análise da pessoa com deficiência visual. In: CONGRESSO SOBRE TECNOLOGIAS NA EDUCAÇÃO, 8., 2023, Santarém. Anais [...]. Porto Alegre: SBC, 2023. p. 182–191. DOI: 10.5753/ctrle.2023.232651

PITCHER-COOPER, C. et al. You described, we archived: a rich audio description dataset. Journal on Technology and Persons with Disabilities, v. 11, p. 192–208, 2023.

REVATHY P. et al. Seeing with sound: automatic image captioning with auditory output for the visually impaired. In: INTERNATIONAL CONFERENCE ON RESEARCH METHODOLOGIES IN KNOWLEDGE MANAGEMENT, ARTIFICIAL INTELLIGENCE AND TELECOMMUNICATION ENGINEERING (RMKMATE), 2023, Chennai. Proceedings [...]. IEEE, 2023. p. 1–5. DOI: 10.1109/RMKMATE59243.2023.10368610

SHANTHI, N. et al. Deep learning based audio description of visual content by enhancing accessibility for the visually impaired. In: INTERNATIONAL CONFERENCE ON SUSTAINABLE COMMUNICATION NETWORKS AND APPLICATION (ICSCNA), 2023, Theni. Proceedings [...]. IEEE, 2023. p. 1234–1240. DOI: 10.1109/ICSCNA58489.2023.10370414

SHEN, X. et al. Fine-grained audible video description. In: CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, Vancouver. Proceedings [...]. IEEE/CVF, 2023. p. 10585–10596. DOI: 10.1109/CVPR52729.2023.01020

SUCHARITHA, V. et al. Imaging description production by means of deeper neural networks. In: INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS (ACCAI), 2023, Chennai. Proceedings [...]. IEEE, 2023. p. 1–5. DOI: 10.1109/ACCAI58221.2023.10200618.

VRINDAVANAM, J. et al. Machine learning based approach to image description for the visually impaired. In: ASIAN CONFERENCE ON INNOVATION IN TECHNOLOGY (ASIANCON), 2021, Pune. Proceedings [...]. IEEE, 2021. p. 1–6. DOI: 10.1109/ASIANCON51346.2021.9544867

WORLD WIDE WEB CONSORTIUM. Web Content Accessibility Guidelines (WCAG) 2.1, 2018. Disponível em: https://www.w3.org/TR/WCAG21/ - Acesso em: 15 maio 2025.

XIN, B. et al. A comprehensive survey on deep-learning-based visual captioning. Multimedia Systems, v. 29, p. 3781–3804, 2023. DOI: 10.1007/s00530-023-01175-x

Published

2025-07-16

How to Cite

Perdigão, L., Crespo, S., & Ferreira, C. D. (2025). Systematic Literature Review on the use of Artificial Intelligence for audio description of EAD images. EaD Em Foco, 15(2), e2593. https://doi.org/10.18264/eadf.v15i2.2593

Most read articles by the same author(s)