Suggesting Text Alternatives for Images in Social Media

Letícia Seixas Pereira¹, João Guerreiro², André Rodrigues, André Santos,
João Vicente, José Coelho, Tiago Guerreiro³, Carlos Duarte⁴

LASIGE, Faculdade de Ciências, Universidade de Lisboa, Portugal

¹lspereira@fc.ul.pt, ²jpguerreiro@fc.ul.pt , ³tjvg@di.fc.ul.pt, ⁴caduarte@campus.ul.pt

Abstract

Image description has been a recurrent topic on web accessibility over the years. With the increased use of social networks, this discussion is even more relevant. Social networks are responsible for a considerable part of the images available on the web. In this context, users are not only consuming visual content but also creating it. Due to this shared responsibility of providing accessible content, major platforms must go beyond accessible interfaces. Additional resources must also be available to support users in creating accessible content. Although many of today's services already support accessible media content authoring, current efforts still fail to properly integrate and guide their users through the authoring process. One of the consequences is that many users are still unaware of what an image description is, how to provide it, and why it is necessary. We present SONAAR, a project that aims to improve the accessibility of user-generated content on social networks. Our approach is to support the authoring and consumption of accessible social media content. Our prototypes currently focus on Twitter and Facebook and are available as an Android application and as a Chrome extension

Social media accessibility

During the past years, online content production and consumption have taken a significant shift in direction. The unilateral model, where one was responsible for producing content to be consumed by others, has given place to new scenarios where users can be both authors and consumers. With the proliferation of camera-equipped mobile devices, most of this user-generated content is composed of images and videos and shared via social networks.

In this context, approaches to ensure the accessibility of media content must also keep pace with these changes. Major social network services have introduced features to help users create accessible videos and images by adding alternative textual descriptions to media content. Facebook and Twitter, two of the largest social media services, followed two different approaches. Facebook provides a machine-generated textual description for images, allowing users to further edit it. Twitter chose to provide an input field for textual description into the authoring process. The former (providing machine-generated descriptions by default) ensures every shared image contains an alternative description. However, blind users still report it as lacking contextual information to properly understand the image (Morris et al. 2016; Voykinska et al., 2016; MacLeod et al., 2017; Gleason et al, 2019; Whitney et al. 2020). On the other hand, providing an input field for users to write their own alternative description must be associated with a proper interaction flow. At the moment, most social network users are unaware of the possibility and the benefits of creating text alternatives for their visual content. Even if they are aware of the possibility, that does not mean they know how to do it. Furthermore, the many popular memes and GIFs cannot, at this point, benefit from the machine-generated text alternatives. This great variety of visual content combined with the specific context of personal content shared on social networks makes it harder for automated solutions to provide a proper solution. In this context, it is essential to better include users in accessibility practices.

SONAAR

SONAAR is a project funded by the European Commission aiming to prototype a mechanism capable of increasing the amount of accessible user-generated content available on social networks on mobile and desktop platforms. To achieve this goal, SONAAR pursues specific objectives:

Facilitate user-generation of accessible content

Design a new interaction flow for accessible content authoring

In order to have accessible media content on social networks, the first step is to provide an easy authoring process. For that, we first conducted a user study with social network users, with and without disabilities. This study included interviews and contextual inquiries to better understand the challenges they face when trying to upload accessible media content on social networks. As we describe in (Pereira et al., 2021), most users without disabilities are not aware that it is possible to provide an alternative description for images on the social networks they use, despite the features currently deployed. This fact is reinforced by the other parcel of users reporting not knowing where to write an alternative description. Users also report not knowing how to write a suitable alternative description, considering this task too time-consuming or unnecessary because they are used to sharing their content only with close friends and family who do not have accessibility needs. With this information in mind, we defined the requirements for a new interaction flow that supports an accessible content authoring process.

When designing our solution, we considered that people use different devices and interfaces - e.g., smartphones, tablets, laptops - to engage in all types of activities on their social networks, such as accessing, authoring, and sharing content. As such, our prototypes must support desktop web browsers and mobile devices. Finally, helping users write good descriptions is a key part of this process as well. Our findings also indicate that users with visual disabilities prefer descriptions generated by the authors themselves, as they provide more details and contextual information. Our prototypes must then ensure that end-users are properly guided through an accessible authoring process. Providing them a suggestion of alternative description or concepts that may be contained in the image may help them to better understand how to write a suitable alternative description and what information should be included in it. In addition, it starts a learning process that, in the long run, can help to educate users on accessible practices in general.

Prototype a new interaction flow for accessible content authoring

Two prototypes were developed to support accessible authoring in Google Chrome and on Android devices. Our prototypes are composed of a backend service, a mobile front-end, and a web front-end.

The backend provides a set of web services powering both web and mobile clients and is responsible for all data storage and management. Front-end clients can detect when media is being shared on Twitter and Facebook, and our backend provides recommendations of text alternatives based on this content. These recommendations include 1) previous alternative descriptions provided for the same image by other users of our prototypes; 2) a list of concepts related to the image; 3) recognition of text in the image. In order to answer the client’s request, the backend service relies on two different features: 1) descriptions in the same language of the user’s device or browser; 2) the number of times a description has been used. As a result, our backend service provides a filtered and ordered list to front-end clients.

The front-end clients begin the process by identifying key elements to detect when media content is being uploaded on Twitter or Facebook. The Android application inspects the elements on the screen and looks for values of specific attributes. The Chrome extension inspects the DOM of the web page looking for the presence of elements with specific class attributes. After detecting that an image was uploaded on the authoring page or screen, this image is then sent to the backend service along with the language of the user’s device or browser.

When a response to this request is sent back by the backend service, the user is notified that a suggestion of a description was identified and is guided through the authoring process. On Android, the user is presented with the top-rated description along with a message indicating how to include it in the post or tweet. On the Google Chrome extension, the top-rated description is copied to the clipboard and a series of overlay windows are used to indicate where to enter the description. In some cases, the backend service provides more than one suggestion, and, at this moment, the user can also request to see the extra descriptions and select any one of them to copy to the input field. When the tweet or post is submitted, the front-end clients send the description entered by the user to the backend service. If this description is already stored, the number of times it was is incremented or, if it is not, it is stored as a new description. Further details of the structure and the functioning of our prototypes are provided in our previous work (Duarte et al., 2021).

Deploy user-generated accessible content on (other) mobile and web platforms

Support accessible content requests

Numerous images shared on social networks are constantly uploaded to various platforms, resulting in a loss of text descriptions even if an alternative text was added in the first place. Considering the structure already deployed for providing suggestions for the authoring process, we envisioned taking advantage of previously generated text descriptions in consumption scenarios, where users visit external platforms and may request alternative text of existing images. As a result, the same suggestions can be offered outside the scope of the social networks currently supported, Facebook and Twitter.

Prototype the accessible content requests feature

The same structure deployed for the backend service is used to handle requests for a consumption scenario. The functionalities deployed on the front-end clients are then extended to support this new process. On the Android application, the user can share an image with the SONAAR services and a list with the suggested descriptions available will be presented to the user. Due to the different technologies, the Chrome extension has a different workflow. The user can request SONAAR to analyze any web page and the top-ranked description will be embedded on every image on the page. Along with that, SONAAR also makes all images focusable. The user has also the option to request a list of descriptions available for a particular image.

Ensure an accessible content authoring process

Another aspect constantly neglected is the authoring process of accessible media content by people with visual disabilities. To provide an accessible authoring process, our user research also identified major barriers faced by people with visual disabilities when trying to share an accessible image. Our findings suggest that the difficulty in finding the option to provide an alternative description is common ground among all users (Pereira et al., 2021). This difficulty is accentuated, though, among users with visual disabilities. Due to the constant changes in the interface of major social networks, they often have to adapt to a new interface structure - a more challenging task for screen reader users. The second obstacle in this process is the lack of assistance in providing an alternative description. Even with the latest efforts promoted by major social network providers, people with disabilities are still dependent on other people to conduct some of their online activities. For that, not only SONAAR has to provide accessible content but also ensure an accessible authoring process.

Engage users in the production of accessible content

The last objective of SONAAR is to disseminate accessibility practices to social network users. As previously described, the barriers faced by users go beyond interface difficulties. Users are not aware of how people with disabilities are consuming online content and the benefits of creating accessible content. Our user research indicates, though, that most of them are interested in promoting inclusion and enabling access to information for all. By making users part of this process, integrating accessibility features more prominently on the authoring flow, and educating them about accessible practices, we hope to increase awareness and thus encourage users to become more frequently engaged in such practices.

In order to achieve this, we deployed two different approaches. The first one is integrated with our front-end clients. By prompting a message as soon as users upload an image, guiding them, and providing suggestions of description, our front-end clients establish an inherent and standardized authoring process for accessible media content. The other approach is to provide support documentation to provide users with more information about the accessibility context in general. We identified that most of the documentation is hard to find in the official sources (social network platforms themselves) and is also extensive, making it harder to understand. According to our studies, some users consider that accessible practices require additional effort and time. Therefore, this format of documentation is not suitable for this context. Based on this, our documentation to support and engage users in the production of accessible content follows two main concepts: 1) use of plain and simple language, avoiding jargons and technical terms so users without prior knowledge can easily understand, and 2) presenting short and objective tasks, containing only the essential information to be conveyed, allowing users to quickly go through them. Overall, our documentation provides an informative guide on how people with disabilities are consuming media content and why to engage in accessible practices, some examples of suitable alternative descriptions, and how the usage of SONAAR can improve the access of people with visual impairments.

Discussion

With an extensive use of SONAAR prototypes, we first expect to increase the awareness of social network users to the importance and benefits of creating accessible media content. By educating users on the importance of this task and how to do it properly, the impact may be extended to other services used by those users, reaching a much wider scope outside social networks context. Following that, the mixed approach taken in this project fills some of the current gaps on social image descriptions, by benefiting from the existing image recognition advances to enhance human descriptions. Automated solutions are used to improve the accessibility of images on websites or mobile applications where the content authors did not produce accessible content. At the same time, using it in the authoring process deployed in SONAAR supports users in creating better alternative descriptions. Not only this approach may improve the accessibility of today’s online content, but also may demonstrate the efficacy of user augmentation tools for accessibility purposes, where artificial intelligence can be particularly used to assist end-users with a task instead of replacing them.

In future works, we also intend to explore different approaches to better support and extend current functionalities. Due to the volatility of the interface of major platforms, the recognition of uploading media elements is highly dependent on the values of the attributes used by social network services. While these issues can be addressed by new SONAAR updates, other approaches can be deployed to optimize this process, such as a new feature to ask users to identify the required elements or using a machine learning approach to identify the elements involved in the media authoring process. Better support for multi-language descriptions for other social media platforms can also be implemented to expand our current capabilities and increase the overall range of SONAAR. These opportunities and challenges are further explored in (Duarte et al., 2021).

Collaborate with SONAAR

We are conducting a user study to evaluate and further improve our prototypes until the end of June 2021. If you are interested in participating in this study, please fill out our recruitment form or contact us at the following email: sonnar@fc.ul.pt.

Acknowledgments

This paper was written with support from the SONAAR Project, co-funded by the European Commission (EC) through GA LC-01409741. This work was supported by FCT through funding of LASIGE Unit R&D, Ref. UIDB/00408/2020.

References

Cole Gleason, Patrick Carrington, Cameron Cassidy, Meredith Ringel Morris, Kris M Kitani, and Jeffrey P Bigham. 2019. “It’s almost like they’re trying to hide it”: How User-Provided Image Descriptions Have Failed to Make Twitter Accessible. In The World Wide Web Conference on - WWW ’19. ACM Press, New York, New York, USA, 549–559. https://doi.org/10.1145/3308558.3313605
Gill Whitney and Irena Kolar. 2020. Am I missing something? Universal Access in the Information Society 19, 2 (jun 2020), 461–469. https://doi.org/10.1007/s10209- 019- 00648- z
Haley MacLeod, Cynthia L. Bennett, Meredith Ringel Morris, and Edward Cutrell. 2017. Understanding blind people’s experiences with computer-generated captions of social media images. Conference on Human Factors in Computing Systems - Proceedings 2017-May (2017), 5988–5999. https://doi.org/10.1145/3025453. 3025814
Meredith Ringel Morris, Anushka Zolyomi, Catherine Yao, Sina Bahram, Jeffrey P. Bigham, and Shaun K. Kane. 2016. "With most of it being pictures now, I rarely use it": Understanding Twitter’s Evolving Accessibility to Blind Users. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 5506–5516. https://doi.org/10.1145/2858036.2858116
Violeta Voykinska, Shiri Azenkot, Shaomei Wu, and Gilly Leshed. 2016. How Blind People Interact with Visual Content on Social Networking Services. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing - CSCW ’16, Vol. 1. ACM Press, New York, New York, USA, 1582–1593. https://doi.org/10.1145/2818048.2820013
Letícia Seixas Pereira, José Coelho, André Rodrigues, João Guerreiro, Tiago Guerreiro, and Carlos Duarte. 2021. Barriers and Opportunities to Accessible Social Media Content Authoring. arXiv:2104.10968 [cs.HC]
Carlos Duarte, Letícia Seixas Pereira, André Santos, João Vicente, André Rodrigues, João Guerreiro, José Coelho, and Tiago Guerreiro. 2021. Nipping Inaccessibility in the Bud: Opportunities and Challenges of Accessible Media Content Authoring. In 13th ACM Web Science Conference 2021 (WebSci ’21 Companion), June 21–25, 2021, Virtual Event, United Kingdom. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3462741.3466644

About the Authors

Letícia Seixas Pereira is a Postdoctoral researcher at LASIGE, University of Lisbon (Faculty of Science). Her main areas of expertise are Human-Computer Interaction and Accessibility. Her research has been focused on web accessibility, working closely with people with visual impairments and cerebral palsy.

João Guerreiro is an Assistant Professor at the University of Lisbon (Faculty of Sciences) and a researcher at LASIGE. His research aims to improve blind people’s access to the physical and digital worlds using novel non-visual interaction techniques and systems.

André Rodrigues André Rodrigues is a researcher at LASIGE at the Universidade de Lisboa He is a CS researcher focused on HCI with a particular interest in how technology can and is leveraged in accessibility, health, and gaming.

José Coelho is currently an Invited Assistant Professor at the University of Lisbon (Faculty of Sciences) and a researcher at LASIGE. His research interests are in the areas of older adults, accessibility, usability, human-computer interaction, and multimodal interfaces with an emphasis on social isolation, active ageing and tourism.

Tiago Guerreiro is an Assistant Professor at the University of Lisbon (Faculty of Sciences) and a researcher at LASIGE. His main areas of expertise are HCI, Accessible Computing, and Pervasive Healthcare. He received awards for 10+ papers, including at ASSETS, CHI, SOUPS, HRI, ITS, and MobileHCI. He is Editor-in-Chief for ACM Transactions on Accessible Computing.

Carlos Duarte is an Assistant Professor at the University of Lisbon (Faculty of Sciences) and a researcher at LASIGE. His main areas of expertise are Accessibility and Human-Computer Interaction. His main research interests combine accessibility and intelligent user interfaces to improve the user experience of different target populations. He has published 100+ peer-reviewed papers.