简体   繁体   English

如何编写一个简单的聊天机器人 AI?

[英]How can I program a simple chat bot AI?

I want to build a bot that asks someone a few simple questions and branches based on the answer.我想构建一个机器人,根据答案向某人询问一些简单的问题和分支。 I realize parsing meaning from the human responses will be challenging, but how do you setup the program to deal with the "state" of the conversation?我意识到从人类反应中解析意义将具有挑战性,但是您如何设置程序来处理对话的“状态”?

It will be a one-to-one conversation between a human and the bot.这将是人和机器人之间的一对一对话。

You probably want to look into Markov Chains as the basics for the bot AI.您可能希望将马尔可夫链作为机器人 AI 的基础。 I wrote something a long time ago (the code to which I'm not proud of at all, and needs some mods to run on Python > 1.5) that may be a useful starting place for you:我很久以前写过一些东西(我一点也不引以为豪的代码,需要一些 mod 才能在 Python > 1.5 上运行),这对你来说可能是一个有用的起点: http://sourceforge.net/projects/benzo/ http://sourceforge.net/projects/benzo/

EDIT: Here's a minimal example in Python of a Markov Chain that accepts input from stdin and outputs text based on the probabilities of words succeeding one another in the input.编辑:这是马尔可夫链的 Python 中的一个最小示例,它接受来自 stdin 的输入并根据输入中单词的先后顺序输出文本。 It's optimized for IRC-style chat logs, but running any decent-sized text through it should demonstrate the concepts:它针对 IRC 风格的聊天记录进行了优化,但通过它运行任何大小合适的文本应该展示以下概念:

import random, sys

NONWORD = "\n"
STARTKEY = NONWORD, NONWORD
MAXGEN=1000

class MarkovChainer(object):
    def __init__(self):
        self.state = dict()

    def input(self, input):
        word1, word2 = STARTKEY
        for word3 in input.split():
            self.state.setdefault((word1, word2), list()).append(word3)
            word1, word2 = word2, word3 
        self.state.setdefault((word1, word2), list()).append(NONWORD)

    def output(self):
        output = list()
        word1, word2 = STARTKEY
        for i in range(MAXGEN):
            word3 = random.choice(self.state[(word1,word2)])
            if word3 == NONWORD: break
            output.append(word3)
            word1, word2 = word2, word3
        return " ".join(output)

if __name__ == "__main__":
    c = MarkovChainer()
    c.input(sys.stdin.read())
    print c.output()

It's pretty easy from here to plug in persistence and an IRC library and have the basis of the type of bot you're talking about.从这里插入持久性和 IRC 库并拥有您所谈论的机器人类型的基础非常容易。

Folks have mentioned already that statefulness isn't a big component of typical chatbots:人们已经提到状态性不是典型聊天机器人的重要组成部分:

  • a pure Markov implementations may express a very loose sort of state if it is growing its lexicon and table in real time—earlier utterances by the human interlocutor may get regurgitated by chance later in the conversation—but the Markov model doesn't have any inherent mechanism for selecting or producing such responses.如果一个纯粹的马尔可夫实现可以实时地增加它的词典和表格,那么它可能会表达一种非常松散的状态——人类对话者之前的话语可能会在对话的后期偶然被反刍——但马尔可夫模型没有任何固有的选择或产生这种反应的机制。

  • a parsing-based bot (eg ELIZA) generally attempts to respond to (some of the) semantic content of the most recent input from the user without significant regard for prior exchanges.基于解析的机器人(例如 ELIZA)通常会尝试响应用户最近输入的(某些)语义内容,而无需考虑先前的交换。

That said, you certainly can add some amount of state to a chatbot, regardless of the input-parsing and statement-synthesis model you're using.也就是说,您当然可以向聊天机器人添加一些状态,而不管您使用的是哪种输入解析和语句合成模型。 How to do that depends a lot on what you want to accomplish with your statefulness, and that's not really clear from your question.如何做到这一点在很大程度上取决于您想通过状态实现什么,而您的问题并不清楚。 A couple general ideas, however:但是,有一些一般性想法:

  • Create a keyword stack.创建关键字堆栈。 As your human offers input, parse out keywords from their statements/questions and throw those keywords onto a stack of some sort.当您的人提供输入时,从他们的陈述/问题中解析出关键字并将这些关键字扔到某种堆栈中。 When your chatbot fails to come up with something compelling to respond to in the most recent input—or, perhaps, just at random, to mix things up—go back to your stack, grab a previous keyword, and use that to seed your next synthesis.当您的聊天机器人无法在最近的输入中提出令人信服的回应时——或者,可能只是随机地将事情混在一起——回到你的堆栈,获取上一个关键字,并用它来播种你的下一个合成。 For bonus points, have the bot explicitly acknowledge that it's going back to a previous subject, eg "Wait, HUMAN, earlier you mentioned foo. [Sentence seeded by foo]".对于奖励积分,让机器人明确承认它正在返回上一个主题,例如“等等,人类,你之前提到过 foo。[由 foo 播种的句子]”。

  • Build RPG-like dialogue logic into the bot.将类似 RPG 的对话逻辑构建到机器人中。 As your parsing human input, toggle flags for specific conversational prompts or content from the user and conditionally alter what the chatbot can talk about, or how it communicates.在解析人工输入时,切换特定对话提示或用户内容的标志,并有条件地改变聊天机器人可以谈论的内容或交流方式。 For example, a chatbot bristling (or scolding, or laughing) at foul language is fairly common;例如,聊天机器人对粗言秽语大发雷霆(或责骂或大笑)是相当普遍的; a chatbot that will get het up, and conditionally remain so until apologized to , would be an interesting stateful variation on this.一个聊天机器人会变得兴奋,并有条件地保持这种状态直到向 道歉,这将是一个有趣的有状态变体。 Switch output to ALL CAPS, throw in confrontational rhetoric or demands or sobbing, etc.将输出切换为全部大写,加入对抗性的言辞或要求或抽泣等。

Can you clarify a little what you want the state to help you accomplish?你能澄清一下你希望国家帮助你完成什么吗?

Imagine a neural network with parsing capabilities in each node or neuron.想象一个在每个节点或神经元中具有解析能力的神经网络。 Depending on rules and parsing results, neurons fire.根据规则和解析结果,神经元会触发。 If certain neurons fire, you get a good idea about topic and semantic of the question and therefore can give a good answer.如果某些神经元激活,您就可以很好地了解问题的主题和语义,因此可以给出很好的答案。

Memory is done by keeping topics talked about in a session, adding to the firing for the next question, and therefore guiding the selection process of possible answers at the end.记忆是通过在会话中保持讨论的主题来完成的,添加到下一个问题的触发中,从而指导最后可能的答案的选择过程。

Keep your rules and patterns in a knowledge base, but compile them into memory at start time, with a neuron per rule.将您的规则和模式保存在知识库中,但在开始时将它们编译到内存中,每个规则都有一个神经元。 You can engineer synapses using something like listeners or event functions.您可以使用诸如侦听器或事件函数之类的东西来设计突触。

I think you can look at the code for Kooky , and IIRC it also uses Markov Chains.我认为您可以查看Kooky的代码,而 IIRC 也使用了马尔可夫链。

Also check out the kooky quotes , they were featured on Coding Horror not long ago and some are hilarious.还可以查看古怪的引语,不久前它们出现在 Coding Horror 上,有些还很搞笑。

I think to start this project, it would be good to have a database with questions (organized as a tree. In every node one or more questions).我认为开始这个项目时,最好有一个带有问题的数据库(组织为树。在每个节点中都有一个或多个问题)。 These questions sould be answered with "yes " or "no".这些问题应以“是”或“否”来回答。

If the bot starts to question, it can start with any question from yuor database of questions marked as a start-question.如果机器人开始提问,它可以从您的问题数据库中标记为开始问题的任何问题开始。 The answer is the way to the next node in the tree.答案是到树中下一个节点的方式。

Edit: Here is a somple one written in ruby you can start with: rubyBOT编辑:这是一个用 ruby​​ 编写的简单的,你可以从: rubyBOT

naive chatbot program.天真的聊天机器人程序。 No parsing, no cleverness, just a training file and output.没有解析,没有聪明,只是一个训练文件和输出。

It first trains itself on a text and then later uses the data from that training to generate responses to the interlocutor's input.它首先在文本上训练自己,然后使用来自该训练的数据来生成对对话者输入的响应。 The training process creates a dictionary where each key is a word and the value is a list of all the words that follow that word sequentially anywhere in the training text.训练过程创建一个字典,其中每个键是一个单词,值是训练文本中任何位置顺序跟随该单词的所有单词的列表。 If a word features more than once in this list then that reflects and it is more likely to be chosen by the bot, no need for probabilistic stuff just do it with a list.如果一个词在这个列表中出现不止一次,那么这反映了它更有可能被机器人选择,不需要概率性的东西,只需用一个列表来做。

The bot chooses a random word from your input and generates a response by choosing another random word that has been seen to be a successor to its held word.机器人从您的输入中选择一个随机词,并通过选择另一个被视为其保留词的后继词的随机词来生成响应。 It then repeats the process by finding a successor to that word in turn and carrying on iteratively until it thinks it's said enough.然后它通过依次找到该单词的后继并迭代地进行重复该过程,直到它认为它已经说得够多了。 It reaches that conclusion by stopping at a word that was prior to a punctuation mark in the training text.它通过在训练文本中标点符号之前的单词处停止来得出该结论。 It then returns to input mode again to let you respond, and so on.然后它再次返回到输入模式让您响应,依此类推。

It isn't very realistic but I hereby challenge anyone to do better in 71 lines of code !!这不是很现实,但我在此挑战任何人在 71 行代码中做得更好!! This is a great challenge for any budding Pythonists, and I just wish I could open the challenge to a wider audience than the small number of visitors I get to this blog.对于任何初出茅庐的 Python 爱好者来说,这都是一个巨大的挑战,我只是希望我能向更广泛的受众开放挑战,而不是我访问这个博客的少数访问者。 To code a bot that is always guaranteed to be grammatical must surely be closer to several hundred lines, I simplified hugely by just trying to think of the simplest rule to give the computer a mere stab at having something to say.为了编写一个始终保证符合语法的机器人代码,它肯定接近几百行,我只是试图想出最简单的规则,让计算机能够有话要说,从而大大简化了程序。

Its responses are rather impressionistic to say the least !它的反应至少可以说是相当印象派的! Also you have to put what you say in single quotes.此外,您必须将您所说的放在单引号中。

I used War and Peace for my “corpus” which took a couple of hours for the training run, use a shorter file if you are impatient…我使用战争与和平作为我的“语料库”,训练运行需要几个小时,如果您不耐烦,请使用较短的文件......

here is the trainer这是教练

#lukebot-trainer.py
import pickle
b=open('war&peace.txt')
text=[]
for line in b:
    for word in line.split():
        text.append (word)
b.close()
textset=list(set(text))
follow={}
for l in range(len(textset)):
    working=[]
    check=textset[l]
    for w in range(len(text)-1):
        if check==text[w] and text[w][-1] not in '(),.?!':
            working.append(str(text[w+1]))
    follow[check]=working
a=open('lexicon-luke','wb')
pickle.dump(follow,a,2)
a.close()

here is the bot这是机器人

#lukebot.py
import pickle,random
a=open('lexicon-luke','rb')
successorlist=pickle.load(a)
a.close()
def nextword(a):
    if a in successorlist:
        return random.choice(successorlist[a])
    else:
        return 'the'
speech=''
while speech!='quit':
    speech=raw_input('>')
    s=random.choice(speech.split())
    response=''
    while True:
        neword=nextword(s)
        response+=' '+neword
        s=neword
        if neword[-1] in ',?!.':
            break
    print response

You tend to get an uncanny feeling when it says something that seems partially to make sense.当它说一些似乎有道理的东西时,你往往会有一种不可思议的感觉。

If you're just dabbling, I believe Pidgin allows you to script chat style behavior.如果你只是涉足,我相信Pidgin允许你编写聊天风格的行为。 Part of the framework probably tacks the state of who sent the message when, and you'd want to keep a log of your bot's internal state for each of the last N messages.框架的一部分可能会跟踪谁何时发送消息的状态,并且您希望为最后 N 条消息中的每一条保留机器人内部状态的日志。 Future state decisions could be hardcoded based on inspection of previous states and the content of the most recent few messages.未来的状态决策可以基于对先前状态和最近几条消息的内容的检查进行硬编码。 Or you could do something like the Markov chains discussed and use it both for parsing and generating.或者你可以做一些类似于讨论过的马尔可夫链的事情,并将它用于解析和生成。

If you do not require a learning bot, using AIML ( http://www.aiml.net/ ) will most likely produce the result you want, at least with respect to the bot parsing input and answering based on it.如果您不需要学习机器人,使用 AIML ( http://www.aiml.net/ ) 很可能会产生您想要的结果,至少在机器人解析输入和基于它的回答方面。

You would reuse or create "brains" made of XML (in the AIML-format) and parse/run them in a program (parser).您将重用或创建由 XML(以 AIML 格式)组成的“大脑”,并在程序(解析器)中解析/运行它们。 There are parsers made in several different languages to choose from, and as far as I can tell the code seems to be open source in most cases.有几种不同语言的解析器可供选择,据我所知,在大多数情况下,代码似乎是开源的。

You can use "ChatterBot", and host it locally using - 'flask-chatterbot-master"您可以使用“ChatterBot”,并使用 - 'flask-chatterbot-master' 在本地托管它

Links:链接:

  1. [ChatterBot Installation] https://chatterbot.readthedocs.io/en/stable/setup.html 【ChatterBot 安装】 https://chatterbot.readthedocs.io/en/stable/setup.html
  2. [Host Locally using - flask-chatterbot-master]: https://github.com/chamkank/flask-chatterbot [Host Locally using - flask-chatterbot-master]: https : //github.com/chamkank/flask-chatterbot

Cheers,干杯,

Ratnakar拉特纳卡

I would suggest looking at Bayesian probabilities.我建议查看贝叶斯概率。 Then just monitor the chat room for a period of time to create your probability tree.然后只需监视聊天室一段时间即可创建您的概率树。

I'm not sure this is what you're looking for, but there's an old program called ELIZA which could hold a conversation by taking what you said and spitting it back at you after performing some simple textual transformations.我不确定这是否是您要查找的内容,但是有一个名为ELIZA的旧程序,它可以通过执行一些简单的文本转换后将您所说的内容吐还给您来进行对话。

If I remember correctly, many people were convinced that they were "talking" to a real person and had long elaborate conversations with it.如果我没记错的话,很多人都相信他们是在和一个真实的人“交谈”,并与他进行了长时间的精心对话。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM