Home » Values

Category Archives: Values

Subscribe via Email

– Embedding human values


This post samples some of the thinking and research in recent years relating to embedding human values into artificial intelligent systems (AI/Ss). It differentiates between different types of ethical ‘agents’ and introduces the concepts of value alignment and social norms.   It sets out different approaches to value-alignment that reflect different schools of thought in the development of AI, and concludes that a hybrid approach may show the way forward to building A/ISs that can both learn and reason with ethical concepts. 

(approx 1,551 words)

Embedding human values into autonomous intelligent systems

Values drive ethics.  Ethical principles are underpinned by the system of values they derive from.  ‘Thou shall not kill’ derives from the value we put on life. ‘Thou shall not steal’ derives from the value we put on property, notions of ownership and peaceful co-existence.  In principle we could give robots sets of ethical principles to adhere to (like Asimov’s three laws of robotics), but in practice it may be more efficient, adaptive and effective to align the values of a robot or other A/IS with the human population it is expected to serve.  The A/IS might then derive from the values, the ethical principles it should adhere to and hence its behaviour.

Moor (2009) [1] distinguishes between four types of ethical agents:

  • Type 1: Ethical impact agents – This is an A/IS that performs actions that has ethical implications (either good or bad) but is not necessarily designed with ethics in mind. 
  • Type 2: Implicit ethical agents – have ethics built into the design (e.g. autopilots, cash machines).  These are designed to act ethically (i.e. not crash, give out the correct cash) or unethically (e.g. a spam virus) but not to use ethical principles in their operation. 
  • Type 3: Explicit ethical agents – these have mechanisms that can identify particular situations, infer or reason about what would be ethical and apply actions appropriately.   This is the type of ethical agent that most people have in mind when they think about robot ethics.
  • Type 4: Full ethical agents – these can make ethical judgements in a wide variety of situations and are thought of as accountable for their actions.  People are full ethical agents.  Robots as depicted in science fiction are portrayed as full ethical agents but they currently do not exist in the real world.   

A particular A/IS, however intelligent and autonomous, could have characteristics of any or all of the first three of these types.  An employee selection algorithm, for example, might be autonomously short-listing candidates according to a highly complex set of characteristics (based, for example, on a big data training set). Its decisions might have ethical implications (e.g. discriminating against certain groups) that are quite unintended.

Researchers concerned with embedding human values in A/ISs often refer to ‘value alignment’ (e.g. [2], [3]). This is the idea that human values and the values of the A/IS should correspond or run in the same direction.  Some of these researchers have in mind creating type 2 ethical agents, while others are thinking of type 3 agents.  This often gives rise to debate about what are human values and how we address the problem that different cultures and individuals have different values.  Some researchers propose methodologies for ‘capturing’ the values of particular user groups (e.g. the elderly needing care, a factory worker needing an automated assistant, or a surgeon performing an operation remotely).  The task being performed can be an important factor in determining the relevant values (e.g. the surgeon values precision, the airline pilot values efficient routes that avoid bad weather).  Other researchers (e.g. [4]) suggest the use of ‘AI Guardians’ that oversee the operation of an A/IS and check that it does not contravene laws and values.

Various mechanisms have been proposed to achieve the alignment of A/ISs with human values and the notion of values is often extended to refer to social norms (or regularities) in patterns of human behaviour.  For example, in a restaurant a common pattern of behaviour is to enter, wait to be shown to a table, sit down, read the menu, order drinks, eat, pay and leave. Such a pattern has been termed a script [5].  There can be variants on a script such as tipping, complementing the chef or going over to talk to a recognised friend, but some activities, such as throwing plates on the floor, would be seen as exceptions within most cultures. 

The main approaches to the problem of value-alignment are:

  • Explicitly program the machine with what to do in every circumstance it is likely to encounter
  • Devise a general set of rules or principles that can be consulted when a decision is being made and use these to evaluate possible options
  • Have the machine learn from examples the mapping between situations and suitable actions

The first approach to value-alignment would be to explicitly enumerate all the situations an A/IS might encounter and to prescribe its behaviour in those situations.  However, given the complexity, ambiguity, changeability and uncertainty in the world, it would be impossible to enumerate all the possible combinations and permutations of circumstances that would require ethical evaluation.

There have been attempts to capture ethical principles in rule-based expert systems (e.g. [6]).  This second approach can be thought of as ethical reasoning and is typical of the second wave of development of AI systems. These systems would attempt to capture expertise in particular domains and code them into sets of rules that were called expert systems.  These systems were able to carry out long and complex chains of reasoning using programming languages such as Prolog (a logic programming language) and Lisp (a list processing language).  A virtue of these systems is that it was possible, in principle, to inspect the chain of reasoning that the programs went through in order to arrive at a conclusion.  This would serve as an explanation or justification for its decision-making. 

However, there were a number of problems with this approach.  Not only was it difficult to understand complex machine made chains of reasoning but the approach did not map onto to the way human experts either made or explained their decisions.  While a doctor, for example, might use a certain amount of logical inference in arriving at a diagnosis, more typically diagnosis was more to do with the recognition of a pattern.   The pattern might include a patients symptoms but could just as easily include the time of day the patient turned up at the surgery, how they dressed or other factors seemingly quite unrelated to any medical knowledge.  Furthermore, the doctor may not be fully aware of the patterns that pointed her or him in a particular direction.  Sometimes, further enquiry would be based on a hunch and it was not clear what exactly had triggered it.  

Other studies of expert decision-making, such as that of chess masters, seemed to confirm that an approach based on a deep search of possibilities (characteristic of most computer approaches) in the absence of the recognition of familiar patterns neither corresponded to the way people solve problems nor was particularly effective.  This is mainly because the real-world is messy.  Things do not fall into neat definitive categories that can then be logically reasoned with.  The world is full of uncertainty and ambiguity and arriving at a judgement or decision is more of a recognition or configuration problem than it is a reasoning problem.  This history is well documented as having led to the 2nd AI ‘winter’, from approximately1987 to 1993 when funding for AI dried up and many companies working in AI either cutback or closed (e.g. [7]). 

Because of the difficulties in explicitly enumerating or reasoning with human values, in recent years, many value-alignment mechanisms have been based on the third approach of machine learning.  In a machine learning approach the A/IS is fed with large sets of data describing instances a wide range of situations and reinforcement learning is used (i.e. having a person judge and feedback whether a computer algorithm has got a decision right or wrong) to adjust the weights in a neural net so that the A/IS learns to map from the characteristics of a situation to suitable ethical decisions.

A difficulty is that the interpretation of what is happening in any situation is as much a function of the observer as it is the situation itself. Different people might make different judgements about what is appropriate in a given situation, especially as there can be wide cultural variation. Riedl and Harrison (2016) [8] demonstrate mechanisms such that an A/IS could read stories in order to extract the values that exist within particular cultures. This could not only be more efficient than the traditional reinforcement learning approach but might also accommodate local variation in values and norms.

An inverse reinforcement learning (IRL) approach has been proposed [9] whereby an A/IS aligns its values by observing the behaviours of people and attempts to infer the ‘reward function’ they are trying to optimise, i.e. what they are trying to achieve [10]. Building on this, another mechanism of incorporating norms and values [11] is an extension to the Beliefs, Desires, Intentions (BDI) model of a rational agent [12].  The extension would, in principle,  enable the A/IS to perceive and adapt its behaviour in response to observed norms and then internalise (i.e. learn) these. 

There are several problems with both top-down rule-based and machine learning approaches to value-alignment.  The main ones are:

  • The rule-based approach does not work for tasks that people find easy, like recognising objects, emotions or situations
  • The machine learning approach cannot provide explanations as to how it reached a particular conclusion

Understanding the differences, strengths and weaknesses of these two approaches provides important clues in finding better ways to address the problem of value-alignment between people and A/ISs. 


[1]    Moor J. H. (2009), Four Kinds of Ethical Robots, Philosophy Now, Issue 72.  https://philosophynow.org/issues/72/Four_Kinds_of_Ethical_Robots

[2]    Arnold, T., D. Kasenberg, and M. Scheutz., 2017,  “Value Alignment or Misalignment — What Will Keep Systems Accountable? ”, The Workshops of the Thirty-First AAAI Conference on Artificial Intelligence: Technical Reports, WS-17-02: AI, Ethics, and Society, 81–88. Palo Alto, CA: The AAAI Press

 [3]    Conn, A., 2017, “How Do We Align Artificial Intelligence with Human Values?” Future of Life Institute, February 3.

[4]    Etzioni, A., 2016,  “Designing AI Systems That Obey Our Laws and Values.” , Communications of the ACM 59, no. 9: 29–31.

[5]    Schank, R., and Abelson, R. 1977. Scripts, Plans, Goals, and Understanding: An Inquiry into Human Knowledge Structures. Lawrence Erlbaum Associates

[6]    Shankar G, Simmons A., 2009, Understanding ethics guidelines using an internet-based expert system Journal of Medical Ethics;35:65-68. https://jme.bmj.com/content/35/1/65

 [7]    AAAI-17 Invited Panel on Artificial Intelligence (AI) history: Expert systems, 2017, Moderated by:, David C. Brock. Panelists: Edward Feigenbaum,  Bruce Buchanan, Randall Davis, Eric Horvitz, Recorded February 06, 2017, San Francisco, CA, CHM Reference number: X8151.2017, ©  Computer History.

[8]    Riedl, M. O., and B. Harrison., 2016, “Using Stories to Teach Human Values to Artificial Agents.” Proceedings of the 2nd International Workshop on AI, Ethics and Society, Phoenix, Arizona.

[9]    Russell, S.; Dewey, D.; and Tegmark, M., 2016, Research priorities for robust and beneficial artificial intelligence. arXiv preprint arXiv:1602.03506.

[10]    Ng, A., and Russell, S. 2000. Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference On Machine Learning.

[11]    Tufi Mihnea¸ and Ganascia Jean-Gabriel, 2015, Grafting Norms onto the BDI Agent Model, Chapter 7 in Robert Trappl (Ed), A Construction Manual for Robots’ Ethical Systems: Requirements, Methods, Implementations, Springer

[12]    Rao, A.S., & Georgeff, M.P. (1995). BDI Agents: From Theory to Practice. ICMAS.

How do we embed ethical self-regulation into artificial Autonomous, Intelligent Systems (A/ISs)? One answer is to design architectures for A/ISs that are based on ‘the Human Operating System’ (HOS).

Theory of Knowledge

A simple computer program will know very little about the world, and will have little or no capacity to reflect upon what its knows or the boundaries of its applicability.

Any sophisticated A/IS, if it is to apply ethical principles appropriately, will need to be based on a far more elaborate theory of knowledge (epistemology).

The epistemological view taken in this blog is eclectic, constructivist and pragmatic. As we interact with the world, we each individually experience patterns, receive feedback, make distinctions, learn to reflect, and make and test hypotheses. The distinctions we make, become the default constructs through which we interpret the world and the labels we use to analyse, describe, reason about and communicate. Our beliefs are propositions expressed in terms of these learned distinctions and are validated via a variety of mechanisms, that themselves develop over time and can change in response to circumstances.

We are confronted with a constant stream of contradictions between ‘evidence’ obtained from different sources – from our senses, from other people, our feelings, our reasoning and so on. These surprise us as they conflict with default interpretations. When the contradictions matter, (e.g. when they are glaringly obvious, interfere with our intent, or create dilemmas with respect to some decision), we are motivated to achieve consistency. This we call ‘making sense of the world’, ‘seeking meaning’ or ‘agreeing’ (in the case of establishing consistency with others). We use many different mechanisms for dealing with inconsistencies – including testing hypotheses, reasoning, intuition and emotion, ignoring and denying.

In our own reflections and in interactions with others, we are constantly constructing mini-belief systems (i.e. stories that help orientate, predict and explain to ourselves and others). These mini-belief systems are shaped and modulated by our values (i.e. beliefs about what is good and bad) and are generally constructed as mechanisms for achieving our current intentions and future intentions. These in turn affect how we act on the world.