Systematic Literature Review on the use of Artificial Intelligence for audio description of EAD images.
DOI:
https://doi.org/10.18264/eadf.v15i2.2593Keywords:
Artificial intelligence, Distance education, Audio descriptionAbstract
With recent advances in digital technologies, incorporating artificial intelligence into distance education now presents itself as a much more viable and accessible possibility. The use of AI to prepare audio description of static images can be a potential accessibility instrument for EAD. The objective of this Systematic Literature Review - RSL - is to identify the state of the art on the use of artificial intelligence for audio description of static images. The execution of the RSL was supported with the Parsifal tool, and, through the specified protocol, 62 articles were found. The title, abstract and keywords were analyzed and, using the inclusion, exclusion and quality criteria, 22 studies were accepted. After fully reading the accepted studies, 86 algorithms and AI tools used were listed, in addition to 69 databases and 9 other tools useful in the development of instruments for describing images. These results point to potential possibilities for using ready-made instruments to develop a tool for audio description of static images for distance learning.
Downloads
References
ALAM, M. Z. I.; ISLAM, S.; HOQUE, E. SeeChart: enabling accessible visualizations through interactive natural language interface for people with visual impairments. In: INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES (IUI), 28., 2023. Proceedings [...]. New York: ACM, 2023. p. 46–64. DOI: 10.1145/3581641.3584099.
ANGRAVE, L.; LI, J.; ZHONG, N. Creating TikToks, memes, accessible content, and books from engineering videos? First solve the scene detection problem. Grantee Submission, 2022.
CAMPOS, V. P.; GONÇALVES, L. M.; ARAÚJO, T. M. Applying audio description for context understanding of surveillance videos by people with visual impairments. In: IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, 14., 2017, Lecce. Proceedings [...]. IEEE, 2017. p. 1–5. DOI: 10.1109/AVSS.2017.8078530
Dittakan, K., Prompitak, K., Thungklang, P. et al. Image caption generation using transformer learning methods: a case study on Instagram image. Multimedia Tools and Applications, v. 83, p. 46397–46417, 2024. DOI: 10.1007/s11042-023-17275-9
DOS SANTOS, G. O.; COLOMBINI, E. L.; AVILA, S. #PraCegoVer: a large dataset for image captioning in Portuguese. Data, v. 7, n. 2, p. 13, 2022. DOI: 10.3390/data7020013
FERREIRA, L. et al. Presenter-centric image collection and annotation: enhancing accessibility for the visually impaired. In: SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES, 36., 2023, Rio Grande. Proceedings [...]. IEEE, 2023. p. 199–204. DOI: 10.1109/SIBGRAPI59091.2023.10347135
HAN, T. et al. AutoAD II: the sequel – who, when, and what in movie audio description. In: IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2023, Paris. Proceedings [...]. IEEE, 2023. p. 13599–13609. DOI: 10.1109/ICCV51070.2023.01255.
HAN, T. et al. AutoAD: movie description in context. In: CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, Vancouver. Proceedings [...]. IEEE/CVF, 2023. p. 18930–18940. DOI: 10.1109/CVPR52729.2023.01815.
KARAGÖZ, E.; KORUYAN, K. Design of audio description system using cloud based computer vision. Mehmet Akif Ersoy Üniversitesi Uygulamalı Bilimler Dergisi, v. 4, n. 1, p. 74–85, 2020. DOI: 10.31200/makuubd.651261.
KITCHENHAM, B. ; CHARTERS, S. Guidelines for performing Systematic Literature Reviews in Software Engineering (EBSE 2007-001). Keele University and Durham University Joint Report, 2007.
KRUSZIELSKI, L. F. et al. The Use of Artificial Intelligence Enabling Scalable Audio Description on Brazilian Television: A Workflow Proposal. SET INTERNATIONAL JOURNAL OF BROADCAST ENGINEERING, [S. l.], v. 9, n. 1, 2025. Disponível em: https://revistaeletronica.set.org.br/ijbe/article/view/271
LIMA, F. J. de; TAVARES, F. S. S. Subsídios para a construção de um código de conduta do áudio-descritor. Revista Brasileira de Tradução Visual (RBTV), 2010. Disponível em http://www.associadosdainclusao.com.br/enades2016/sites/all/themes/berry/documentos/07-subsidios-para-a-construcao-de-um-codigo-de-conduta.pdf - Acesso em setembro de 2016.
OION, M. S. R. et al. A machine learning based image to object detection and audio description for blind people. In: INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 26., 2023, Cox’s Bazar. Proceedings [...]. IEEE, 2023. p. 1–6. DOI: 10.1109/ICCIT60459.2023.10441089
ONCESCU, A.-M. et al. A sound approach: using large language models to generate audio descriptions for egocentric text-audio retrieval. In: IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2024, Seoul. Proceedings [...]. IEEE, 2024. p. 7300–7304. DOI: 10.1109/ICASSP48485.2024.10448486
PARISFAL. Review Planning. 2021. Disponível em: https://parsif.al/help/#planning - Acesso em: 15 maio 2025.
PERDIGÃO, L. T. Conteúdos audiovisuais acessíveis na educação a distância: a audiodescrição didática como pós-produção. 2023. 150 f. Tese (Doutorado em Ciências, Tecnologias e Inclusão) – Universidade Federal Fluminense, Niterói, 2023.
PERDIGÃO, L. T.; LIMA, N. R. W. Vendo com outros olhos: a audiodescrição na educação a distância. Niterói: [s. n.], 2017. 100 p. Il. Ebook. Disponível em: https://educapes.capes.gov.br/bitstream/capes/429946/4/GuiaPDFfinal_2020.pdf - Acesso em: 15 maio 2025.
PERDIGÃO, L. T.; MONTEIRO, F. V.; FERNANDES, E. M. Audiodescrição como texto alternativo nas principais redes sociais digitais. In: CONGRESSO INTERNACIONAL INTERDISCIPLINAR EM SOCIAIS E HUMANIDADES, 10., 2021, Niterói. Anais [...]. Niterói: Programa de Pós-Graduação, 2021. Disponível em: https://www.even3.com.br/anais/xc22021/435829-AUDIODESCRICAO-COMO-TEXTO-ALTERNATIVO-NAS-PRINCIPAIS-REDES-SOCIAIS-DIGITAIS - Acesso em: 21 out. 2024.
PERDIGÃO, L. et al. Inteligência artificial para audiodescrição de imagens: uma análise da pessoa com deficiência visual. In: CONGRESSO SOBRE TECNOLOGIAS NA EDUCAÇÃO, 8., 2023, Santarém. Anais [...]. Porto Alegre: SBC, 2023. p. 182–191. DOI: 10.5753/ctrle.2023.232651
PITCHER-COOPER, C. et al. You described, we archived: a rich audio description dataset. Journal on Technology and Persons with Disabilities, v. 11, p. 192–208, 2023.
REVATHY P. et al. Seeing with sound: automatic image captioning with auditory output for the visually impaired. In: INTERNATIONAL CONFERENCE ON RESEARCH METHODOLOGIES IN KNOWLEDGE MANAGEMENT, ARTIFICIAL INTELLIGENCE AND TELECOMMUNICATION ENGINEERING (RMKMATE), 2023, Chennai. Proceedings [...]. IEEE, 2023. p. 1–5. DOI: 10.1109/RMKMATE59243.2023.10368610
SHANTHI, N. et al. Deep learning based audio description of visual content by enhancing accessibility for the visually impaired. In: INTERNATIONAL CONFERENCE ON SUSTAINABLE COMMUNICATION NETWORKS AND APPLICATION (ICSCNA), 2023, Theni. Proceedings [...]. IEEE, 2023. p. 1234–1240. DOI: 10.1109/ICSCNA58489.2023.10370414
SHEN, X. et al. Fine-grained audible video description. In: CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, Vancouver. Proceedings [...]. IEEE/CVF, 2023. p. 10585–10596. DOI: 10.1109/CVPR52729.2023.01020
SUCHARITHA, V. et al. Imaging description production by means of deeper neural networks. In: INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS (ACCAI), 2023, Chennai. Proceedings [...]. IEEE, 2023. p. 1–5. DOI: 10.1109/ACCAI58221.2023.10200618.
VRINDAVANAM, J. et al. Machine learning based approach to image description for the visually impaired. In: ASIAN CONFERENCE ON INNOVATION IN TECHNOLOGY (ASIANCON), 2021, Pune. Proceedings [...]. IEEE, 2021. p. 1–6. DOI: 10.1109/ASIANCON51346.2021.9544867
WORLD WIDE WEB CONSORTIUM. Web Content Accessibility Guidelines (WCAG) 2.1, 2018. Disponível em: https://www.w3.org/TR/WCAG21/ - Acesso em: 15 maio 2025.
XIN, B. et al. A comprehensive survey on deep-learning-based visual captioning. Multimedia Systems, v. 29, p. 3781–3804, 2023. DOI: 10.1007/s00530-023-01175-x
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 EaD em Foco

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in Revista EaD em Foco receive the license
Creative Commons - Atribuição 4.0 Internacional (CC BY 4.0).
All subsequent publications, complete or partial, must be made with the acknowledgment, in citations, of the Revista EaD em Foco as the original editor of the article.