We Need To Care About Ethical AI

Written by: Ho Min Joshua Sun

I remember back in 2016 when AlphaGo bested Lee Sedol over a 5-game series to become the world’s first AI to defeat a professional human Go player. Unlike other games like Chess, Go has a much larger number of possible permutations and it was originally thought to be an impossible task for AI to handle – the proverbial ‘two minute mile’.

I spent the last decade of my life working with robotics and AI, and one of the more common questions that I’ve received since then is whether I think “AI is a dangerous or positive development for human beings”. 

Most people want a simple answer to this question, but unfortunately in my experience I’ve found the reality to be much more complicated. For one thing, it’s clear we are still in the relatively early stages of unlocking the full potential of AI, and even earlier still in understanding the impacts that it can have on our societies and communities. 

That being said, it’s undeniable that AI is already starting to have a large degree of influence on our personal and professional lives. I believe that we are currently living through a unique and critical period in human history, and that in the coming years the importance of ‘ethical AI’ systems will become an unavoidable conversation in the public mainstream domain.


Since ethical AI frameworks are still a relatively niche area of research and development within the AI community, I wanted to write this article to provide the reader with an understanding on what it means to develop ‘ethical AI’ and why this is becoming increasingly necessary as more powerful systems start to come online.

After researching this topic, I believe that there are still salient issues that will need to be addressed, and that it will be critical for as many different groups of people to collectively participate in order to create not just an ethical AI, but also an equitable one. 

What is Ethical AI?

Before we move on, it is useful to first understand that the goal of ‘ethical AI’, broadly speaking, is to ensure that advanced and powerful AI systems correctly reflect and align with human values, such that the AI should not act in a way that would knowingly cause harm to anybody or anything.

If we look back historically, this was less of an issue in the past, since the earliest AI systems were all designed for simple tasks or applications, such as playing chess or translating texts. Even if the AI were to act irrationally, the potential stakes – and therefore the risk of harm – are relatively low.

However, machine learning methods have advanced considerably in recent years, and with the combination of increasing processing power in modern hardware and the virtually limitless amounts of data we have created over the last few decades, we are starting to see much more creative applications of AI that have real world implications – Tesla’s AI powered Autopilot system is one such example. 

What worries me is that these modern AI systems are increasingly designed to operate without ‘humans-in-the-loop’, meaning they are starting to work independently of human input. Prof. Colin Allen, History & Philosophy of Science at University of Pittsburgh, further expresses that AI is already starting to work at speeds that “increasingly prohibits humans from evaluating whether each action is performed in a responsible manner”.

Ultimately, this means that as AI systems become more autonomous, I believe it becomes increasingly necessary that we find ways to incorporate ‘ethical AI’ so that these systems do not act in ways that would be harmful to us before we are able to stop it.

The goal of ‘ethical AI’, broadly speaking, is to ensure that advanced and powerful AI systems correctly reflect and align with human values, such that the AI should not act in a way that would knowingly cause harm to anybody or anything

Are Today's AI Ethical?

To provide some context on why increasingly autonomous AI systems worry me, it is helpful to take a look at the way we design AI systems today. In Iason Gabriel’s insightful research paper, Senior Research Scientist at DeepMind, he argues that, despite the similarities, there is a significant difference between AI that is designed to align with human desires, intentions and values. 

Broadly speaking:

  • Intentions – the AI does what I intend for it to do
  • Desires – the AI does what my behaviour reveals I prefer
  • Values – the AI does what it ought to do, as defined by an individual and/or group of people

Based on my experience, I believe the vast majority of today’s advanced and most powerful AI systems are designed to align with our intentions and/or desires. Whilst these approaches are not strictly ‘unethical’, I do believe that the absence of consideration for human values within the AI can, and already has, led to instances of the AI acting in ways that are unintended and harmful to us.


Currently, a significant portion of the AI research and development community are focused on the challenge of ensuring that AI does what ‘we intend it to do’. The popularisation of deep machine learning models that rely on artificial neural networks has created a general consensus among the technical research community that AI systems should not be designed to follow our instructions in an extremely literal sense. This is because we have already seen many notable failed implementations of AI systems that are designed in this way – Microsoft’s infamous twitter chatbot, Tay, being a very notorious example. 

However, I know firsthand from my years of experience working with Natural Language Processing (NLP) solutions, that closing the instruction-intention gap is not a trivial challenge. For AI to understand our ‘intentions’ it will have to be able to grasp the numerous subtleties and ambiguity in human communication, and achieving this would most likely require a complete model of human language and interactions.

Moreover, solving the instruction-intention gap, whilst a monumental achievement, does not eliminate the problem altogether that an individual human acting in either good or bad faith could ultimately provide the AI with misinformed or faulty intentions (or even worse, malicious intentions), which could still lead to the AI operating in harmful ways. I believe we are can already see the potential dangers of this when we take a look at the rise of AI systems designed to align with our desires. 


Over the last decade, there has been a growing trend, particularly in the e-commerce and social media industries, for personalised products and services. This in turn has created a trend of AI systems designed to align with our ‘preferences as revealed by our behaviours’. An extremely popular application for this type of AI system that you may be familiar with are recommendation engines.

Most modern tech companies utilise their user data in some form of recommendation or optimization engine, and for the most part, users generally find the features offered by these engines to be useful. For example, I frequently use the recommendation engine developed by Spotify, which analyses songs that I’ve been listening to, to provide suggestions for new music discovery

However, I have some rising concerns on the long-term suitability of AI systems that are solely designed to align with our desires. Just from a purely philosophical standpoint, it is unclear to me whether our individual behaviour or preferences are even a reliable way to measure or assess collective human values. For example, in many instances individual humans regularly partake in self-destructive and harmful behaviour, such as when one has an addiction problem, whilst a large percentage of society would look unfavourably on ‘enabling’ this type of behaviour.

But even more alarming to me are the practical problems that this raises – over the last decade, by designing AI systems to optimise for user behaviour and preferences, the online ecosystem has started to segregate into ‘echo chambers’. This is a dangerous new development, and we have already experienced how these echo chambers can potentially lead to the rise of radicalised groups. These are the types of unintended consequences and harmful outcomes that we must avoid with powerful future AI systems.

Incorporating Human Values into AI

Although there are some companies that have started to tackle this challenge, my understanding is that AI systems designed to align with human values are still mostly theoretical or experimental. Nonetheless, I believe it is worth covering how this approach to AI design can help to address some of the issues we mentioned earlier.

 In this context, ‘values’ are another nuanced paradigm to humans that are not accurately and/or entirely captured by our languages or behaviours. Values are often shared by different people, they can be cultural or transcend borders; and they represent our shared ideals of what we believe is good and bad.

Designing AI to align with human values would likely involve teaching the AI to understand and model the value of both tangible (such as goods and commodities) and intangible concepts (such as love, life, and equality). I assume that this, like creating a complete model of human language and interactions, would be a monumental undertaking. 

However the effort may be worth the result, as it could help to quickly minimise and limit any malicious intentions or behaviours that a bad actor might have when interacting with the AI. In this approach, rather than simply doing as it is instructed, the AI would be imbued with nuanced decision making capabilities and could weigh the action based on its ‘defined set of values, and do what it ought to do’.

Unfortunately, this brings us to the foremost issue that we still need to collectively resolve if we are to further advance ethical AI: “which, or whose, values and/or principles should we model into an ethical AI?”

This is a difficult and sensitive issue, as people will often have different beliefs about values and principles which aren’t easily resolved, and it’s important to keep in mind that these beliefs are also prone to changing over time. However, I believe that this is a conversation that must start sooner rather than later, as there is an undeniable growing need to safeguard our future against the potential dangers of powerful new AI systems.

Start with Universal Principles

Although I don’t intend to delve too deeply into this issue in this article – that is a discussion that should be conducted on a society-wide or global basis – I think it is worth remarking as a former law school graduate, that there is an existing and familiar framework that we could draw upon to start creating a schema of agreed upon human values in an equitable way.

I am referring to the formation of democratic governments and the development of its laws and regulations. Broadly speaking, democratic governments are formed with fundamental, universally agreed upon principles (sometimes enshrined into a ‘constitution’), such as freedom, equality, and suffrage (the right to vote). By starting with these principles, groups of people can start to engage in discourse and participate in fair voting processes to introduce new laws that reflect the values of the period. Over time, as modern sensibilities change, old laws are amended or replaced with new ones. 

Similarly, I believe that the process for creating ethical AI should start with universal principles. Notwithstanding technical and other challenges, in an ideal system every person would have the same right and freedom to participate in discussions, as well as equal rights to vote on what we believe are the values and/or principles that we should model into an ethical AI. 

If we want ethical AI to accurately and fairly align with our collective human values, it will be critical to have as much participation as possible in these processes.

Concluding Remarks

As I mentioned at the onset of this article, there is no simple answer as to whether AI is a positive or dangerous development for human beings – we are still waiting to see how these events will unfold.

However, as I hope I’ve articulated in this article, I believe that, now more than ever, it’s important we start to take steps to expand the way we design and build powerful new AI systems to incorporate a wider consideration for human values, morals and principles in order to mitigate the risks of AI acting in a way that could cause harm to us. And finally, I hope I have impressed upon the reader that in order for any AI to accurately and fairly reflect our collective human values, it will be critical for as many people to participate in the process of defining these values and principles.