Artificial intelligence multilingual image-to-speech foraccessibility and text recognition

Hasanul Fahmi

doi:10.63944/5jjdb330

Authors

Hasanul Fahmi Author

DOI:

https://doi.org/10.63944/5jjdb330

Keywords:

Image-to-speech;Multilingual audio descriptions;Natural language processing;Optical character recognition; Text-to-speech

Abstract

The primary challenge for visually impaired and illiterate individuals is accessing and understanding visual content, which hinders their ability to navigate environments and engage with text-based information. This research addresses this problem by implementing an artificial intelligence (AI)- powered multilingual image-to-speech technology that converts text from images into audio descriptions. The system combines optical character recognition (OCR) and text-to-speech (TTS) synthesis, using natural language processing (NLP) and digital signal processing (DSP) to generate spoken outputs in various languages. Tested for accuracy, the system demonstrated high precision, recall, and an average accuracy rate of 0.976, proving its effectiveness in real-world applications. This technology enhances accessibility, significantly improving the quality of life for visually impaired individuals and offering scalable solutions for illiterate populations. The results also provide insights for refining OCR accuracy and expanding multilingual support.

References

[1] Y. K. Dwivedi, et al., “Setting the future of digital and social media marketing research: perspectives and research propositions,”International Journal of Information Management, vol. 59, no. 1, pp. 1–37, 2021, doi:10.1016/j.ijinfomgt.2020.102168.

[2] B. Kuriakose, R. Shrestha, and F. E. Sandnes, “Tools and technologies for blind and visually impaired navigation support: a review,”IETE Technical Review, vol. 39, no. 1, pp. 1-16, Sep. 2020, doi: 10.1080/02564602.2020.1819893.

[3] S. Klauke, C. Sondocie, and I. Fine, “The impact of low vision on social function: the potential importance of lost visual social

cues,” Journal of Optometry, vol. 16, no. 1, May 2022, doi: 10.1016/j.optom.2022.03.003.

[4] M. Fayyad and A. R. Al-Sinnawi, “Challenges of achieving financial inclusion for individuals with visual impairments,” Heliyon, vol. 10, no. 16, Aug. 2024, doi: 10.1016/j.heliyon.2024.e35573.

[5] F. Fuentes, A. Moreno, and F. Díez, “The usability of icts in people with visual disabilities: a challenge in spain,” International

Journal of Environmental Research and Public Health, vol. 19, no. 17, Aug. 2022, doi: 10.3390/ijerph191710782.

[6] J. Wang, S. Wang, and Y. Zhang, “Artificial intelligence for visually impaired,” Displays, vol. 77, Apr. 2023, doi:10.1016/j.displa.2023.102391.

[7] R. C. Joshi, N. Singh, A. K. Sharma, R. Burget and M. K. Dutta, “AI-sensevision: a low-cost artificial-intelligence-based robust and real-time assistance for visually impaired people,” IEEE Transactions on Human-Machine Systems, vol. 54, no. 3, pp. 325-336, Jun. 2024, doi: 10.1109/THMS.2024.3375655.

[8] R. Shendge, A. Patil, and S. Kadu, “Smart navigation for visually impaired people using artificial intelligence,” ITM Web of

Conferences, vol. 44, 2022, doi: 10.1051/itmconf/20224403053.

[9] S. Selvan, J. Stella, K. B and N. V. G. S. Nikitha, “Smart shopping trolley based on iot and ai for the visually impaired,” in International Conference on Cognitive Robotics and Intelligent Systems (ICC-ROBINS), Coimbatore, India, 2024, pp. 132-138, doi: 10.1109/ICC ROBINS60238.2024.10533927.

[10] A. Kuzdeuov, O. Mukayev, S. Nurgaliyev, A. Kunbolsyn and H. A. Varol, “ChatGPT for visually impaired and blind,” in International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Osaka, Japan, 2024, pp. 722-727, doi:10.1109/ICAIIC60209.2024.10463430.