简体   繁体   English

Python Chatterbot“ Errno 22”

[英]Python Chatterbot “Errno 22”

I'm trying to train a chatbot, and most of the data is in text files. 我正在尝试训练聊天机器人,并且大多数数据都在文本文件中。

I pull: 我拉:

Matt said you have a "shit load" of dining dollars. I have almost none so if you're willing to sell, I'm willing to buy.

from the text file, but when the chatterbot corpus tries to train the bot, it reads the above as: 从文本文件中获取,但是当chatterbot语料库尝试训练该机器人时,它的读取内容如下:

'Matt said you have a "shit load" of dining dollars\\ I have almost none so if you\'re willing to sell, I\'m willing to buy\\\n'

How can I fix this? 我怎样才能解决这个问题?

This is my code: 这是我的代码:

def train_from_text():
    #chatbot.set_trainer(ListTrainer)
    directory = basedir + "Text Trainers"
    files = find_files_in_directory(directory)
    for file in files:
        conversation = []
        file_name = directory+"/"+file
        with open(file_name, 'r') as to_read:
            for line in to_read:
                conversation.append(line)
        chatbot.train(conversation)

Please excuse the swearing, its the data I was given. 请原谅,我给的数据。

Edit: Full error 编辑:完全错误

Traceback (most recent call last):
  File "E:/Jason Chatterbot/Jason Chat.py", line 102, in <module>
control()
  File "E:/Jason Chatterbot/Jason Chat.py", line 96, in control
train_from_text()
  File "E:/Jason Chatterbot/Jason Chat.py", line 58, in train_from_text
chatbot.train(conversation)
  File "C:\Python27\lib\site-packages\chatterbot\trainers.py", line 119, in train
corpora = self.corpus.load_corpus(corpus_path)
  File "C:\Python27\lib\site-packages\chatterbot_corpus\corpus.py", line 98, in load_corpus
corpus_data = self.read_corpus(file_path)
  File "C:\Python27\lib\site-packages\chatterbot_corpus\corpus.py", line 63, in read_corpus
with io.open(file_name, encoding='utf-8') as data_file:
IOError: [Errno 22] Invalid argument: 'Matt said you have a "shit load" of dining dollars\\ I have almost none so if you\'re willing to sell, I\'m willing to buy\\\r\n'

Without looking at a larger subset of the data, seems like it's replacing single quotes (') with escaped single quotes (\\'), actual newline characters, with escaped newlines (\\n) and periods with double backslashes (\\) 无需查看更大的数据子集,似乎是用转义的单引号(\\'),实际的换行符,转义的换行符(\\ n)和带双反斜杠(\\)的句点替换单引号(')。

A simple string replace might fix it for you, depending on how bad the data is getting munged. 一个简单的字符串替换可能会为您解决,具体取决于数据损坏的严重程度。 Try changing 尝试改变

conversation.append(line)

to

conversation.append(line.replace("\\'","'").replace('\\\\','.').replace("\\n","\n"))

We're basically trying to reverse those substitutions that are being made automatically. 基本上,我们正在尝试撤消自动进行的替换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM