Tag Archives: AI

Alt text: the problematic sub-text

In recent years I have tried, but often failed, to use “alt text” in my work when posting images online. I have failed dismally to go back and try to annotate all of the many images I have posted in the past, and I know I fail to be consistent in doing so in the present. Both of these failings are undoubtedly because of the time that this would take me, despite many platforms now encouraging its use, and my commitment to supporting people with disabilities (see my very old site at https://disabilityict4d.wordpress.com/).

For those who are unfamiliar with alt text it is the attribute in HTML that specifies alternative text for images, and is especially valuable for people with visual impairments, because it helps screen readers convey the meaning of images for them. The World Wide Web Consortium thus recommends that every image displayed through HTML should have alt text associated with it.

However, reflecting on this in the context of the increasing use of AI, has made me very aware of the ways in which alt text can be used by AI systems to describe images without the use of data labellers. At one level, this might be seen positively, because it can reduce the need for AI companies to use what is often termed “slave labour” to do the annotation (see Ganna Pogrebna, 2024). However, this would take away the very small income that such labour can generate, and is indeed valued in many parts of the world. Moreover, it is also a way that such images (or video) can be annotated for free for teh AI companies by the person doing the posting.

Much more worryingly, though, is the potential for alt text to be used maliciously. If, with their permission, I post an image of a friend/colleague online, and label this using alt text, their name will be forever attached to that image, and be a vehicle through which AI and search engines can identify them and link to further related images of them. This could, for example, readily be used to track and surveil them when travelling. However, it would be equally possible for someone else to write something unpleasant or abusive about a person as alt text on an image, and that too would be recorded so that AI could then be used to build very erroneous profiles of them.

I am inclined to think that the potential harms of this outweigh the benefits, although for innocent law-abiding people with visual impairments it would be an immense loss. Is this primarily a new way that the Digital Barons are deliberately exploiting us? Is that why platforms are incresingly encouraging us to include alt text when we post an image (as illustrated in the image above)?

2 Comments

Filed under AI, digital technologies, slavery

Artifical Intelligence and research – it’s not the tech that’s the problem, but rather why it is used!

Artifical Intelligence (AI) has undoubted positive potential to be used to enable completely new kinds of research, especially in areas that require the “analysis” of very large amounts of data. This is particularly so in fields as diverse as modelling environmental change, and medical diagnostics. However, I am shocked almost every day by the scale at which it is now used overtly to cheat (and seek to get an advantage over others) and to support downright laziness in academic research.

Cartoon in exhibition along the shore of Lake Geneva, May 2024

Many univeristies have introduced AI policies for students (both undergraduates and postgraduates) that focus largely on identifying permitted and illegal aspects of the use of AI, focusing especially on penalising perceived abuse thereof. All too often these policies fail sufficiently to recognise how it can indeed be used positively and constructively. What such policies also fail to recognise is the scale at which deceit was already widely practised in universities prior to the advent of AI (see my posts in 2010 on Univeristy Students Cheating and on plagiarism and the production of knowledges, and in 2021 On PhDs). As with so many digital technologies, they serve to accentuate existing aspects of human behaviour. With the massification and grade inflation that has occurred in higher education over the last quarter of a century, it is scarcely surprising that some (perhaps most) people will use any means at their disposal to gain the highest level of certification with the least amount of effort.

What is deeply worrying is the speed at which the use of AI is transforming – and possible destroying – traditional values of academic integrity and labour. Two recent examples highlight the scale of the challenge:

  • Prize nominations. I have recently been on several boards reviewing nominations/applications for prizes. Increasing numbers of these appeared suspicious to me, and a quick check with a variety of AI detectors indicated high probabilities that they were indeed produced through the use of AI. Examining some of the entries for international awards ceremonies over the last couple of years, also suggests that several of the winning entries were produced through the use of AI, and that some of the evidence adduced therein was not based on physical reality. The reasons are obvious, with potential winners believing that they can gain an advantage through the use of such technologies.
  • Research proposals. In the last couple of years, an increasing number of proposals I receive from prospective postgraduate or post-doc applicants are clearly produced using AI. This is deeply concerning, not least because it provides no evidence that an applicant is indeed able to undertake independent research, and were such applicants to be accepted there would probably be very real problems during the research process. However what has provoked me to write this short post is that one such applicant seemed to express surprise that I should actually want to receive an application written only by a human!

Some members of review panels clearly do not mind if AI has been used to enhance a proposal, but I remain very concerned for four main reasons:

  • Above all, extensive reliance on AI to design research (and presumably therefore also to undertake and produce it) will take away the ability of human researchers to think for themselves and create new ideas. This is already happening, but any loss of this ability is deeply problematic, not least since it increasingly limits our individual and collective ability to be resilient and solve new challenges in the future. I fundamentally disagree with arguments that suggest this does not matter because it will enable our brains to concentrate on other, higher level, functionalities.
  • Second, though, AI is only as good as the data drawn upon by its algorithms. Such data, by definition, always comes from the past, and is biased. Hence, it cannot be truly innovative. All it does (at present) is reconfigure existing knowledge in new ways. To be sure, this can be interesting, but randomness (not least in our genetic makeup – although genetic drift is now seen as being less random that was previosuly thought) and serendipity are key elements of true innovation and creativity. These are what has enabled us to survive as a race, and if we lose them we will lose not only our souls but also our physical ability to function.
  • Third, all too oftenpeople using AI do so because they find thinking too difficult and/or they are lazy. They want a quick solution without the effort. Yet if we do not use our brains they will atrophy; if we do not draw on our memories, we will forget how to use them. Digital dementia is already a significant problem, but it will become very much more so in the future if we do not continue to exercise our minds. We must cherish real creative and innovative research by humans. This has always been tough; excellence does not come easy. However, it is rewarding. We must do all we can to encourage and reward real and high quality human research. If not, the mediocre and trivial will increasingly come to dominate our enquiry.
  • Fourth, it raises real problems for institutions and the management of research. Focusing on penalising cheating is a sign of failure. What we need to do is to encourage as many people as possible to think new thoughts for themselves. The mundane can indeed be left to those who enjoy trawling through the slop created by “AI pigs”. This will require much tighter processes for the selection of academic researchers (and this also surely applies to industry), along with absolute certainty and rigour over processes designed to penalise those who seek to dissemble. Put simply, all uses of AI should be declared (and on many occasions may be acceptable), and failure to do so should be accompanied by elimination.

Much more could be written (and indeed has already been writter by others) on these issues, but when people start assuming that universities and researchers welcome AI generated proposals or nominations then it is clear that the rot has already set in, and we need to quarterise it as soon as possible. We don’t have the antibiotics yet that can treat this infection!

1 Comment

Filed under AI, corruption, research, Universities

Crowdsourcing Covid-19 infection rates

Covid-19, 19 March 2020, Source: https://coronavirus.thebaselab.com/

Covid-19, 19 March 2020, Source: https://coronavirus.thebaselab.com/

I have become increasingly frustrated by the continued global reporting of highly misleading figures for the number of Covid-19 infections in different countries.  Such “official” figures are collected in very different ways by governments and can therefore not simply be compared with each other.  Moreover, when they are used to calculate death rates they become much more problematic.  At the very least, everyone who cites such figures should refer to them as “Officially reported Infections”

As I write (19th March 2020, 17.10 UK time), the otherwise excellent thebaselab‘s documentation of the coronavirus’s evolution and spread gives mortality rates (based on deaths as a percentage of infected cases) for China as 4.01%, Italy as 8.34% and the UK as 5.09%.  However, as countries are being overwhelmed by Covid-19, most no longer have the capacity to test all those who fear that they might be infected.  Hence, as the numbers of tests as a percentage of total cases go down, the death rates will appear to go up.  It is fortunately widely suggested that most people who become infected with Covid-19 will only have a mild illness (and they are not being tested in most countries), but the numbers of deaths become staggering if these mortality rates are extrapolated.  Even if only 50% of people are infected (UK estimates are currently between 60% and 80% – see the Imperial College Report of 16th March that estimates that 81% of the UK and US populations will be infected), and such mortality rates are used, the figures (at present rates) become frightening:

  • In Italy, with a total population of 60.48 m, this would mean that 30.24 m people would be infected, which with a mortality rate of 8.34% would imply that 2.52 m people would die;
  • In the UK, with a total population of 66.34 m, this would mean that 33.17 m people would be infected, which with a mortality rate of 5.09% would imply that 1.69 m people would die.

These figures are unrealistic, because only a fraction of the total number of infected people are being tested, and so the reported infection rates are much lower than in reality.  In order to stop such speculations, and to reduce widespread panic, it is essential that all reporting of “Infected Cases” is therefore clarified, or preferably stopped.  Nevertheless, the most likely impact of Covid-19 is still much greater than most people realise or can fully appreciate.  The Imperial College Report (p.16) thus suggests that even if all patients were to be treated, there would still be around 250,000 deaths in Great Britain and 1.1-1.2 m in the USA; doing nothing, means that more than half a million people might die in the UK.

Having accurate data on infection rates is essential for effective policy making and disease management.  Globally, there are simply not enough testing kits or expertise to be able to get even an approximately accurate figure for real infections rates.  Hence, many surrogate measures have been used, all of which have to make complex assumptions about the sample populations from which they are drawn.  An alternative that is fortunately beginning to be considered is the use of digital technologies and social media.  Whilst by no means everyone has access to digital technologies or Internet connectivity, very large samples can be generated.  It is estimated that on average 2.26 billion people use one of the Facebook family of services every day; 30% of the world’s population is a large sample.  Existing crowdsourcing and social media platforms could therefore be used to provide valuable data that might help improve the modelling, and thus the management of this pandemic.

Crowdsourcing

[Great to see that since I first wrote this, Liquid Telecom has used Ushahidi to develop a crowd sourced Covid-19 data gathering initiative]

The violence in Kenya following the disputed Presidential elections in 2007, provided the cradle for the development of the Open Source crowdmapping platform, Ushahidi, which has subsequently been used in responding to disasters such as the earthquakes in Haiti and Nepal, and valuable lessons have been learnt from these experiences.  While there are many challenges in using such technologies, the announcement on 18th March that Ushahidi is waiving its Basic Plan fees for 90 days is very much to be welcomed, and provides an excellent opportunity to use such technologies better to understand (and therefore hopefully help to control) the spread of Covid-19.  However, there is a huge danger that such an opportunity may be missed.

The following (at a bare minimum) would seem to be necessary to maximise the opportunity for such crowdsourcing to be successful:

  • We must act urgently. The failure of countries across the world to act in January, once the likely impact of events in Wuhan unravelled was staggering. If we are to do anything, we have to act now, not least to help protect the poorest countries in the world with the weakest medical services.  Waiting even a fortnight will be too late.
  • Some kind of co-ordination and sharing of good practices is necessary. Whilst a global initiative might be feasible, it would seem more practicable for national initiatives to be created, led and inspired by local activists.  However, for data to be comparable (thereby enabling better modelling to take place) it is crucial for these national initiatives to co-operate and use similar methods and approaches.  There must also be close collaboration with the leading researchers in global infectious disease analysis to identify what the most meaningful indicators might be, as well as international organisations such as the WHO to help disseminate practical findings..
  • An agreed classification. For this to be effective there needs to be a simple agreed classification that people across the world could easily enter into a platform.  Perhaps something along these lines might be appropriate: #CovidS (I think I might have symptoms), #Covid7 (I have had symptoms for 7 days), #Covid14 (I have had symptoms for 14 days), #CovidT (I have been tested and I have it), #Covid0 (I have been tested and I don’t have it), #CovidH (I have been hospitalised), #CovidX (a person has died from it).
  • Practical dissemination.  Were such a platform (or national platforms) to be created, there would need to be widespread publicity, preferably by governments and mobile operators, to encourage as many people as possible to enter their information.  Mutiple languages would need to be incorporated, and the interfaces would have to be as appealing and simple as possible so as to encourage maximum submission of information.

Ushahidi as a platform is particularly appealing, since it enables people to submit information in multiple ways, not only using the internet (such as e-mail and Twitter), but also through SMS messages.  These data can then readily be displayed spatially in real time, so that planners and modellers can see the visual spread of the coronavirus.  There are certainly problems with such an approach, not least concerning how many people would use it and thus how large a sample would be generated, but it is definitely something that we should be exploring collectively further.

Social media

An alternative approach that is hopefully also already being explored by global corporations (but I have not yet read of any such definite projects underway) could be the use of existing social media platforms, such as Facebook/WhatsApp, WeChat or Twitter to collate information about people’s infection with Covid-19. Indeed, I hope that these major corporations have already been exploring innovative and beneficial uses to which their technologies could be put.  However, if this if going to be of any real practical use we must act very quickly.

In essence, all that would be needed would be for there to be an agreed global classification of hashtags (as tentatively suggested above), and then a very widespread marketing programme to encourage everyone who uses these platforms simply to post their status, and any subsequent changes.  The data would need to be released to those undertaking the modelling, and carefully curated information shared with the public.

Whilst such suggestions are not intended to replace existing methods of estimating the spread of infectious diseases, they could provide a valuable additional source of data that could enable modelling to be more accurate.  Not only could this reduce the number of deaths from Covid-19, but it could also help reassure the billions of people who will live through the pandemic.  Of course, such methods also have their sampling challenges, and the data would still need to be carefully interpreted, but this could indeed be a worthwhile initiative that would not be particularly difficult or expensive to initiate if global corporations had the will to do so.

Some final reflections

Already there are numerous new initiatives being set up across the world to find ways through which the latest digital technologies might be used in efforts to minimise the impact of Covid-19. The usual suspects are already there as headlines such as these attest: Blockchain Cures COVID-19 Related Issues in China, AI vs. Coronavirus: How artificial intelligence is now helping in the fight against COVID-19, or Using the Internet of Things To Fight Virus Outbreaks. While some of these may have potential in the future when the next pandemic strikes, it is unlikely that they will have much significant impact  on Covid-19.  If we are going to do anything about it, we must act now with existing well known, easy to use, and reliable digital technologies.

I fear that this will not happen.  I fear that we will see numerous companies and civil society organisations approaching donors with brilliant new innovative “solutions” that will require much funding and will take a year to implement.  By then it will be too late, and they will be forgotten and out of date by the time the next pandemic arrives.  Donors should resist the temptation to fund these.  We need to learn from what happened in West Africa with the spread of Ebola in 2014, when more than 200 digital initiatives seeking to provide information relating to the virus were initiated and funded (see my post On the contribution of ICTs to overcoming the impact of Ebola).  Most (although not all) failed to make any significant impact on the lives and deaths of those affected, and the only people who really benefitted were the companies and the staff working in the civil society organisations who proposed the “innovations”.

This is just a plea for those of us interested in these things to work together collaboratively, collectively and quickly to use what technologies we have at our fingertips to begin to make an impact.  Next week it will probably be too late…

5 Comments

Filed under Africa, AI, Asia, Empowerment, Health, ICT4D

The gendering of AI – and why it matters

Digital technologies are all too often seen as being neutral and value free, and with a power of their own to transform the world.  However, even a brief reflection indicates that this taken-for-granted assumption is fundamentally flawed.  Technologies are created by people, who have very specific interests, and they construct or craft them for particular purposes, more often than not to generate profit.  These technologies therefore carry within them the biases and prejudices of the people who create them.

This is as true of Artificial Intelligence (AI) as it is of other digital technologies, such as mobile devices and robots.  Gender, with all of its diversity, is one of the most important categories through which most people seek to understand the world, and we frequently assign gender categories to non-human objects such as technologies.  This is evident even in the languages that we use, especially in the context of technology.  It should not therefore be surprising that AI is gendered.  Yet, until recently few people appreciated the implication of this.

The AI and machine learning underlying an increasing number of decision-making processes, from recruitment to medical diagnostics, from surveillance technologies to e-commerce, is indeed gendered, and will therefore reproduce existing gender biases in society unless specific actions are taken to counter it.  Three issues seem to be of particular importance here:

  • AI is generally used to manipulate very large data sets.  If these data sets themselves are a manifestation of gender bias, then the conclusions reached through the algorithms will also be biased.
  • Most professionals working in the AI field are male; the World Economic Forum’s 2018 Global Gender Gap Report thus reports that only 22% of AI professionals globally are women. The algorithms themselves are therefore being shaped primarily from a male perspective, and ignore the potential contributions that women can make to their design.
  • AI, rather than being neutral, is serving to reproduce, and indeed accelerate, existing gender biases and stereotypes.  This is typified in the use of female voices in digital assistants such as Alexa and Siri, which often suggest negative or subservient associations with women.  A recent report by UNESCO for EQUALS, for example, emphasises the point that those in the field therefore need to work together to “prevent digital assistant technologies from perpetuating existing gender biases and creating new forms of gender inequality”.

These issues highlight the growing importance of binary biases in AI.  However, it must also be recognised that they have ramifications for its intersection with the nuanced and diverse definitions of gender associated with those who identify as LGBTIQ.  In 2017, for example, HRC and Glaad thus criticised a study claiming to show that deep neural networks could correctly differentiate between gay and straight men 81% of the time, and women 74% of the time, on the grounds that it could put gay people at risk and made overly broad assumptions about gender and sexuality.

The panel session on Diversity by Design: mitigating gender bias in AI at this year’s ITU Telecom World in Budapest (11 September, 14.00-15.15) is designed specifically to address these complex issues.  As moderator, I will be encouraging the distinguished panel of speakers, drawn from industry, academia and civil society, not only to tease out these challenging issues in more depth, but also to suggest how we can design AI with diversity in mind.  This is of critical importance if we are collectively to prevent AI from increasing inequalities at all scales, and to ensure that in the future it more broadly represents the rich diversity of humanity.


THIS WAS FIRST POSTED ON THE ITU’S TELECOM WORLD SITE ON 17TH JUNE 2019.  It is reproduced here with their kind permission.

 

Leave a comment

Filed under AI, Conferences, Gender, ICT4D, ITU