Natural Dialog Systems, automated conversational agents, or, more simply, chatbots have in recent years redefined how computer interactors interface with a wide-variety of everyday activities. While popular adaptation has been relatively recent, the pieces of this combinatorial technology have existed for decades, and chatbots themselves have been around since the 1960s!
This paper attempts to deblackbox ELIZA (1966)—specifically while running the DOCTOR script—the first chatbot and the first therapeutic chatbot, as well as Woebot (2016), a contemporary therapeutic chatbot. It will note similarities and differences between the two combinatorial technologies, gloss over the 50 year history of chatbot design, and pay special attention to certain claims pertaining to the addition of Artificial Intelligence (AI) technology in Woebot.
A Brief Chat
Chatbots are a designed combinatorial technology with a rich design history and numerous contemporary applications. What began in the 1960s as high-concept parody (ELIZA) and thought experiment (PARRY, a stimulation of a person with paranoid schizophrenia) has been developed into, among other things, capable simulations of customer service agents (Ordemann et al. 2021, 17-18). With that said, the design history of chatbots is not linear, the combined computational technologies that power chatbots are not static, and the philosophical approaches behind chatbot designs are not unified. The following insights both display the complexity lying within chatbots’ combinatorial technology and could prove to be useful when breaking into a chatbot’s metaphorical blackbox:
- Chatbots are primarily designed following two distinct development approaches, a Pattern Matching Approach and a Machine Learning Approach (Adamopoulou and Moussiades 2020, 4-8). The Approach that a designer takes determines that algorithms and the techniques that must be adopted (ibid, 4).
- Chatbots designed by developers taking a Pattern Matching Approach often do not necessarily contain what many would consider to be Artificial Intelligence (AI). These chatbots tend to be more-limited (ELIZA, PERRY) combinatorial technologies functioning as rule-based systems that: “Typically, do not create new answers as the ‘knowledge’ used is written by the developer in the form of conversational patterns” (Ibid, 4). Chatbots designed with this Approach in mind tend to operate rather quickly and efficiently, but, the approach is hindered by the fact that: “Answers are automated, repeated, and do not have the originality and spontaneity of human response” (ibid, 5). These chatbots operate on a turn-by-turn basis and do not have the ability to react to an entire conversation’s context.
- Chatbots designed by developers taking a Machine Learning Approach extract content from the interactor using Natural Language Processing (NLP), that is, a technology that “decodes” human natural language into machine language, allowing computational technologies to simulate understanding of text (Ibid, 7). Both Machine Learning and NLP are connected to and derived from the concept of AI. Chatbots designed with this approach have the ability to consider an entire conversation’s context and do not require a developer to create predefined responses for each and every potential input (Ibid, 7). However, these chatbots do require extensive “training sets,” that is, large corpuses of text or datasets with which the chatbot can utilize to appropriately predict natural conversation patterns (Ibid, 7).
Chatbots were developed throughout the 1960s-1970s, with each new release of this particular type of designed combinatorial system containing increased programming complexity. Eleni Adamopoulou and Lefteris Moussiades (2020) claim in their journal article “Chatbots: History, Technology, and Applications,” that: “[AI] was firstly used in the domain of the chatbots with the construction of Jabberwacky in 1988” (2)… This claim is deceptively complex, particularly when taking into account the developmental Approaches to chatbots that, as defined by Admopoulou and Moussiades (2020), simultaneously appear to make a distinction between chatbots implied to be implemented without AI (Pattern Matching) and those implemented with AI (Machine Learning) while concurrently stating that Pattern Matching chatbots, like Jabberwacky feature AI… Adding to this confusion, Admopoulou and Moussiades (2020) list Artificial Intelligence Markup Language (AIML)—there’s AI, right there in the name—as a scripting language commonly used to implement Pattern Matching chatbots.
What is AI?
I do not know. Nor can I confidently say that anyone really knows.Glenn Grigsby
With that said, in order to properly understand chatbots, one has to achieve, at bare-minimum, some kind of definition for AI and the computational technologies that are derived from it, such as Machine Learning, NLP, and Natural Language Generation (NLG). To that end, the following blogpost from Iodine and its featured image (seen below) could be helpful in understanding both the distinctions between these concepts and the overlap they all share.
Blogging for Iodine, Amanda Wratchford (2020), Vice President of Marketing, defines AI as: “a broad term referring to the field of technology that teaches machines to think and learn in order to perform tasks and solve problems like people.” AI, thus, is really no more than a classification for kinds of technology and not a kind of technology in and of itself. Machine Learning, NLP, and NLG are all distinct—yet closely interconnected—computational technologies that are implemented in fully-developed combinatorial technologies designed to behave in a human-like fashion (or so the marketers say). In their description of Jabberwacky, Adamopoulou and Moussiades (2020) state that: “Jabberwacky was written in CleverScript, a language based on spreadsheets that facilitated the development of chatbots, and it used contextual pattern matching to respond based on previous discussions.” Jabberwacky’s designer, Rollo Carpenter (2011), states that his design’s “general AI” is able to store everything everyone has ever “said” to it and then: “Finds the most appropriate thing to say using contextual pattern matching techniques” and without hard-coded rules. Jabberwacky’s ability to function relies completely on human interactors, without people feeding new information into Jabberwacky’s memory, the program would not be able to function. Carpenter (2011) calls it a “conversational Wikipedia.”
I tried communicating with Jabberwacky but, as you can see in the image below, had little success… I am not sure if it is functioning as-intended or if the outdated website lost some (or all) of the chatbot’s stored data thus limiting the system’s abilities. Regardless, the attempt does show the difficulties that come with implementing “AI” within a chatbot… with limited hard-rules and few pre-written responses, conversation gets derailed quite quickly.
Woebot, the modern combinatorial system at the co-center of this paper, purports that it is powered by “Clinically tested therapeutic approaches,” which includes Cognitive Behavioral Therapy (CBT), Interpersonal Psychotherapy (IPT), and Dialectical Behavior therapy (DBT); and “Sophisticated AI-powered delivery.” Woebot bills this as “AI and NLP,” but, from the above, NLP can simply be understood as technology that falls under the umbrella-term that is AI. In order to deblackbox Woebot as a complete combinatorial system, one must also deblackbox its designers (and marketers) use of the term “AI” and how it is applied in the context.
The following two sections look at two therapeutic chatbots: ELIZA and Woebot. ELIZA, the first chatbot, and with its script DOCTOR, the first therapeutic chatbot, is a Pattern Matching chatbot that operates under a system of rules written by its developer, Joseph Weizenbaum, in order to respond to interactor statements. It features no AI technology (though there were some people who mistakenly thought that it did). Woebot, a modern therapeutic chatbot founded in 2017, is a far more advanced combinatorial system; however, when looking at Woebot in the context of the history of therapeutic chatbots (particularly ELIZA) one can notice some interesting (and occasionally competing, albeit possibly unintentionally from a designer’s perspective) design philosophies at play.
Who is ELIZA?
In 1950, British Mathematician Alan Turing devised an experiment now-known as the “Turing Test” that attempted to define a standard for determining computer “intelligence” by proposing that: “for a machine to be considered intelligent it should provide responses to a blinded interrogation that are indistinguishable from those given by a human comparator. In other words, the interrogator should not be able to tell whether the machine or the human was responding” (Powell 2019, 2). This proposal is thought to be the generative idea behind the designed combinatorial technologies that comprise the chatbot, the first of which, ELIZA, saw its construction completed by Joseph Weizenbaum from 1964-1966 (Adamopoulou and Moussiades 2020, 2). This designed combinatorial technology, which primarily ran a set of scripts called “DOCTOR,” simulated a Rogerian-psychotherapist through its apparent ability to plausibly carry on short conversations in English with human computer interactors (Wardrip-Fruin and Montfort 2003, 367).
In order for this simulation to work, interactors submitted “free text inputs (unrestricted words typed in ordinary natural language) into a teletype terminal,” that is, a computational device which allows for direct human-computer interactions (HCIs) through the use of a keyboard (Murray 2012, 52). As a designed combinatorial system (that is, ELIZA running the DOCTOR script for an interactor), ELIZA did not have the capability to understand what was said, but the technologies that comprised it could identify keywords found in user-generated inputs such as “depressed” and apply its cleverly formatted rules to generate seemingly-intelligent responses through simple grammatical inversions and echo responses (Ibid, 52). The resulting interactions can be seen both in this video and through a recreation seen in the image below.
As an early example of a primitive chatbot, ELIZA is both limited and easy to derail, as seen in the below image which shows a brief interaction with a recreation of this system. Its capabilities are best-utilized when free text inputs pertain specifically towards what humans would define as “words often spoken and formulated within the context of therapy.” If a user begins to use words and phrases that departs from that specific context, the chatbot will be unable to identify appropriate keywords and thus unable to generate a seemingly-intelligent response. This is because the combinatorial technologies that comprise ELIZA rely on “pattern matching and a response selection scheme based on [pre-coded] templates” in order to function (Adamopoulou and Moussiades 2020, 2). Therefore, if the interactor produces free text inputs that deviate from a recognizable pattern or that warrant responses not based on any pre-coded template, the combinatorial system’s design should no longer be functional.
This drawback should, at least in theory, break the simulation of an analysand/analyst relationship rather quickly… but, interestingly, and to Weisenbaum’s dismay, many interactors have believed that ELIZA is either a real person or an example of a computational device that literally understands natural language despite the system’s many inappropriate replies and lack of simulation-supporting medias, such as images or sound, to reinforce its illusion (Murray 2012, 52). Weizenbaum was shocked by this tendency as he designed the DOCTOR script with the intent to have ELIZA simply “parody” the psychotherapist role—a role chosen due to the ease of which it could be computationally imitated—not to actually fulfill it (Wardrip-Fruin and Montfort 2003, 370-72). The resulting misconception, namely, that a “computer program [is] more capable than [it] actually [is]” is often referred to as the “Eliza Effect,” a notable cause of reification, or, the personification of an abstract system (Murray 2012, 55).
The reification of ELIZA (and systems like it) will always be inappropriate and reinforce the system’s blackbox thus limiting user-understanding of the technologies at play and preventing informed HCI. There is no “who” there: ELIZA, as it is popularly known, is best-understood as the designed combinatorial system featuring a computational device “running DOCTOR—a script setting rules for organizing interactions between a Rogerian analyst and an analysand”—and the human interactor primed to anticipate this specific interaction. It is simply a powerful showcase for a new-at-the-time form of HCI (Bassett 2019, 805).
What is Woebot?
Woebot is, first and foremost, a company called Woebot Health founded by clinical research psychologist Dr. Alison Darcy in 2017. The designed combinatorial system itself, that is the combinatorial technology plus users primed to properly interact with it—also called Woebot—is the result of a partnership between Dr. Darcy and Andrew Ng, a computer scientist and technology entrepreneur who would go on to chair Woebot Health.’s board. Woebot’s launch occurred on Facebook, near the very start of a chatbot-boom of sorts which, according to Admopoulou and Moussiades (2020), began in 2016 due to a “meaningful evolution of AI technology” and the approval of said technologies’ use on social media platforms’ messaging applications by third party developers (3). Facebook provided Woebot with numerous affordances—namely the participatory affordance of computer environments (interactors are used to Facebook Messengers environment) and the procedural affordance (interactors understand computers can digitally represent real-world objects and processes, such as a therapist)—that perhaps encouraged its relatively quick adoption amongst interactors (Murray 2012, 53-55). In 2018, Woebot released a smartphone app which, according to a press release on the Globe Newswire: “expands on existing capabilities and offers new features. With animation linked to natural language processing (NLP), the chatbot can respond in conversations with animated reactions, giving tailored responses to those who are dealing with specific mental-health issues such as grief, addiction, or self-destructive habits.” The smartphones which host this proprietary app—namely iPhone and Android computational devices—present Woebot with all new affordances and capabilities; and, it is through this proprietary app that interactors are able to converse with Woebot today.
Woebot Health says that its product is a combination of “Psychology+Technology,” stating: “Our proprietary technology combines decades of research in psychology with advanced AI to assess symptoms of anxiety, depression, and other mental health needs and respond with empathy.” In support of this claim, Woebot Health has funded and published numerous studies that make an explicit link between the use of a chatbot and positive mental health outcomes. However, there is not too much information regarding the proprietary technology that comprises Woebots’ combinatorial system. Additionally, Woebot Health says that its technology is comprised of AI and NLP, stating: “As users grow and evolve, so does Woebot—always keeping previous chats in mind to provide the most beneficial and timely therapeutic suggestions. Every interaction makes our platform smarter, so Woebot can deliver the right support when it’s needed.” This is where the easily accessible information on their proprietary technology essentially ends… however, with the right research, one can obtain a fairly good understanding of what lies inside this blackbox.
Admopoulou and Moussiades (2020) describe modern chatbots as combinatorial technologies featuring the following 5 modules seen in the image below (11):
These Modules are defined as:
- User Interface Component
- This includes any computational device(s) that allows interactors to produce text—either by typing (teletype terminal, keyboard) or speech—within a digital environment such as Facebook Messenger or a dedicated proprietary app (Ibid, 8).
- User Message Analysis Component
- A “User Interface Controller” drives interactor text to the User Message Analysis Component where software decodes the text and extracts the resulting machine language for usable data following a Pattern Matching or Machine Learning Approach (Ibid, 8). Essentially, the software in this component latches onto keywords within the decoded text which are then used to determine the chatbot’s next step. This component may also include a spell check component to improve keyword identification and software designed to perform sentiment analysis (Ibid, 9).
- Dialog Management Component
- The Dialog Management Component controls and updates the conversation context, essentially, managing the bots memory and organizing data appropriately so that best-responses can be selected or generated (Ibid, 10). It contains modules that can store, organize, and modify data; can generate requests for additional data if needed; and, can handle errors via automatic troubleshooting (Ibid, 10).
- Backend Component
- The Dialog Management Component can retrieve information needed to fulfill interactor requests from the Backend Component through external Application Programming Interfaces (APIs) calls or Database requests (Ibid, 10). This is essentially one computational technology interacting with another—most likely a server of sorts—without interactor input or influence. In Rules-Based chatbots, the backend contains a Knowledge Base (KB) where pre-designed responses are stored and accessible to the chatbot based on decoded interactor requests (Ibid, 10). The Backend Component may also contain a Relational DataBase (RDB) that allows storage for all past conversations making communication more consistent, relevant, and precise (Ibid, 10).
- Response Generation Component
- “The Response Generation Component produces responses using one or more of three available models: Rule-based, Retrieval based, and Generative-based models. (Ibid, 10)” Rule-based models selects a response from a list of pre-designed responses and do not produce new text; “The Retrieval-based model is more flexible as it selects the most suitable response with the check and analysis of available resources using APIs;” and Generative-based models produce new text through Machine Learning and (NLG) (Ibid, 10). The Generative-based model requires a deep store of data in the backend in order to function.
In a scoping review of chatbots in mental health, Alaa A. Abd-alrazaq et al. (2019) reviewed studies that identified 41 unique chatbots, including Woebot (1). The authors of this scoping review stated that in 92.5% of the studies they reviewed, they saw chatbots generate responses based on predefined rules or decision trees (Rules-Based Approaches); they go on to say that of the 53 studies reviewed, only 4 studies saw chatbots use designed with true Machine Learning Approaches (Ibid, 1, 3-4). The one paper reviewed in this study that explicitly concerned Woebot (which was, additionally, clinical research sponsored by Woebot Health) did not contain a discussion of the combinatorial technologies within Woebot in depth. However, the authors of this research do state that, during the study: “[Woebot] employed several computational methods depending on the specific section or feature. The overarching methodology was a decision tree with suggested responses that also accepted natural language inputs with discrete sections of natural language processing techniques embedded at specific points in the tree to determine routing to subsequent conversational nodes” (Fitzpatrick, Darcy, and Vierhile 2017, 3).
Given this description of Woebot, the writers of the aforementioned scoping review of chatbots in mental health likely categorized this Woebot as being one of the 92.5% of chatbots that did not utilize a Machine Learning Approach (Abd-alrazaq et al. 2019, 5). Based on the following interactions with Woebot, it would be difficult to get the impression that the technologies that comprise it are “intelligent” per se. Which is not necessarily to say that the app isn’t clearly collecting and using user information—it clearly is, and is very upfront about doing so (see images below)—but it appears that the data collected is used more for the design of better decision trees and better decision responses.
Woebot’s app’s interface is more reminiscent of customer service chatbots than say, a messenger app that allows for free text inputs. In the image below, you can see a clear dialog between an interactor (guess who!) and Woebot that exclusively utilized decision tree generated inputs.
This interactor was only able to enter a free text input after about 20 minutes of consisten dialog with the app. When the interactor entered a free text response at the first-given opportunity, Woebot generated an automated request for additional information and then presented the interactor with a large decision tree and no option for additional free text inputs. It appears that the designers of the Woebot app try to keep as much conversation as possible within the decision tree structure.
Based on these encounters with Woebot, it appears that this is a Rules-Based chatbot designed with a Pattern Matching Approach. There likely is an amount of NLP technology within the app’s architecture, but it appears to rely heavily on the creation and use of a minimum selection of keywords (words like depression, anxiety, etc.) in order to properly function.
Woebot’s interface is easy to use and familiar, utilizing technologies that have been adapted to work within common messaging apps, and design techniques that can be seen throughout a multitude of websites and apps. Woebot is so familiar, that, looking at the interaction with the Amazon messaging assistant below, one could easily presume that the two technologies are the same or, at least, directly related: Both utilize a limited messenger-style interface, both heavily rely on decision trees, both have a limited ability to parse and respond to free text inputs that do not contain appropriate keywords, and both run into a similar potential for interactor’s to apply inappropriate mental models while participating in the chatbot environment.
In an exploratory study of mental models in customer service chatbots, researchers Ordemann et al. (2021) noted that: “Some researchers have reported on what may be considered to be a ‘gulf’ between user expectations and the realities of conversational user interfaces” (20). In order to better-understand this phenomenon, this study’s researchers asked the question: “What characterizes the mental models that individuals apply during customer service chatbot interactions?” and, in interviews with participants, produced some fascinating results (Ibid, 22). In particular:
- Participants began interactions with customer service chatbots with the assumption and understanding that natural human language could be used while interacting with chatbots. However, overtime, this understanding shifted and participants stated their need to use very short phrases and keywords for productive communication (Ibid, 26).
- Participants had an implicit assumption that customer service chatbots understood context (Ibid, 26). However, as interactions continued, participants described customer service chatbots as being “non-adaptive and lacking in creativity” (Ibid, 26). Many participants even doubted customer service chatbots had the ability to learn (Ibid, 26).
Clearly, participants of this study fell victim to—and then escaped from—the “Eliza Effect”… but was that there fault? One could argue that interactors with Woebot—not a customer service chatbot, but a chatbot that nonetheless performs kind of like one—could just as easily fall victim to the “Eliza Effect,” in part, because of the way that the combinatorial technology is marketed (“AI+NLP,” adorable reified robot mascot, lack of transparency into the workings of its proprietary technology). The lack of clarity with regards to what Woebot actually is combined with general public confusion regarding what AI and its derivative technologies are make for a rather unique problem; Woebot Health Inc has developed a wonderful combinatorial technology that functions very well but is set up for faulty Human Computer Interactions due to a misalignment between the technology and its marketing which essentially primes low-information interactors to incorrectly interface with its system. This is unfortunate, precisely because insight into the modules that make up the combinatorial technology give way to a very human-centric view of both how it actually works and how it is able to help interactors suffering from depression and anxiety… It brings tens of thousands (or more!) of people together, tracks the similarities between individuals’ experiences, and, through NLP and incredibly advanced algorithms, presents users with information proven to be most useful to folks who’ve shared similar experiences. In other words, it is a medium through which you can “talk” to experts and other similarly struggling interactors while retaining control over your privacy. Woebot is not magic and it is not an all-knowing intelligent robot, if anything, it might be best described using Rollo Carpenter’s description of Jabberwacky: “a conversational Wikipedia”… it is you and everyone else (+technology).
Abd-Alrazaq, Alaa A., Mohannad Alajlani, Ali Abdallah Alalwan, Bridgette M. Bewick, Peter Gardner, and Mowafa Househ. 2019. “An Overview of the Features of Chatbots in Mental Health: A Scoping Review.” International Journal of Medical Informatics 132 (December): 103978.
Adamopoulou, Eleni, and Lefteris Moussiades. 2020. “Chatbots: History, Technology, and Applications.” Machine Learning with Applications 2 (December): 100006.
Bassett, Caroline. 2019. “The Computational Therapeutic: Exploring Weizenbaum’s ELIZA as a History of the Present.” AI & Society 34 (4): 803–12.
Chatterbox, Learning AI, Database, Dynamic – Models Way Humans Learn – Simulate Natural Human Chat – Interesting, Humorous, Entertaining.” n.d. Accessed May 4, 2022. http://www.jabberwacky.com/j2about.
Emma Goldman. 2017. Before Siri and Alexa, There Was ELIZA. https://www.youtube.com/watch?v=RMK9AphfLco.
Fitzpatrick, Kathleen Kara, Alison Darcy, and Molly Vierhile. 2017. “Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial.” JMIR Mental Health 4 (2).
“Machine Learning versus Natural Language Processing: What Is the Difference?” 2020. Iodine. July 22, 2020. https://iodinesoftware.com/blog-machine-learning-versus-natural-language-processing-what-is-the-difference/.
Murray, Janet Horowitz. 2012. Inventing the Medium: Principles of Interaction Design as a Cultural Practice. Cambridge, Mass: MIT Press.
Ordemann, Stine, Marita Skjuve, Asbjørn Følstad, and Cato Alexander Bjørkli. 2021. “Understanding How Chatbots Work: An Exploratory Study of Mental Models in Customer Service Chatbots.” IADIS International Journal on WWW/Internet 19 (1): 17–36.
Powell, John. 2019. “Trust Me, I’m a Chatbot: How Artificial Intelligence in Health Care Fails the Turing Test.” Journal of Medical Internet Research 21 (10): e16222.
Wallace, Michael, and George Dunlop. n.d. “Eliza, a Chatbot Therapist.” Accessed April 30, 2022. https://web.njit.edu/~ronkowit/eliza.html.
Wardrip-Fruin, Noah, and Nick Montfort, eds. 2003. The NewMediaReader. Cambridge, Mass: MIT Press.
“Woebot Health.” n.d. Woebot Health. Accessed April 26, 2022. https://woebothealth.com/.
Denning, Peter J., and Craig H. Martell. 2015. Great Principles of Computing. Cambridge, Massachusetts: The MIT Press.
Gamble, Alyson, and this link will open in a new window Link to external site. 2020. “Artificial Intelligence and Mobile Apps for Mental Healthcare: A Social Informatics Perspective.” Aslib Journal of Information Management 72 (4): 509–23.
Irvine, Martin. 2018. “From Open Extensible Design to Fragmented ‘Appification.’” Google Docs. November 12, 2018.
Manovich, Lev. 2013. Software Takes Command: Extending the Language of New Media. International Texts in Critical Media Aesthetics. New York ; London: Bloomsbury.
Natale, Simone. 2019. “If Software Is Narrative: Joseph Weizenbaum, Artificial Intelligence and the Biographies of ELIZA.” New Media & Society 21 (3): 712–28.
Roth, Carl B., Andreas Papassotiropoulos, Annette B. Brühl, this link will open in a new window Link to external site, Undine E. Lang, and Christian G. Huber. 2021. “Psychiatry in the Digital Age: A Blessing or a Curse?” International Journal of Environmental Research and Public Health 18 (16): 8302.
Tudor Car, Lorainne, Dhakshenya Ardhithy Dhinagaran, Bhone Myint Kyaw, Tobias Kowatsch, Shafiq Joty, Yin-Leng Theng, and Rifat Atun. 2020. “Conversational Agents in Health Care: Scoping Review and Conceptual Analysis.” Journal of Medical Internet Research 22 (8): e17158.
Wan, Evelyn. 2021. “‘I’m like a Wise Little Person’: Notes on the Metal Performance of Woebot the Mental Health Chatbot.” Theatre Journal 73 (3): E-21.