简体   繁体   中英

Python Chatterbot “Errno 22”

I'm trying to train a chatbot, and most of the data is in text files.

I pull:

Matt said you have a "shit load" of dining dollars. I have almost none so if you're willing to sell, I'm willing to buy.

from the text file, but when the chatterbot corpus tries to train the bot, it reads the above as:

'Matt said you have a "shit load" of dining dollars\\ I have almost none so if you\'re willing to sell, I\'m willing to buy\\\n'

How can I fix this?

This is my code:

def train_from_text():
    #chatbot.set_trainer(ListTrainer)
    directory = basedir + "Text Trainers"
    files = find_files_in_directory(directory)
    for file in files:
        conversation = []
        file_name = directory+"/"+file
        with open(file_name, 'r') as to_read:
            for line in to_read:
                conversation.append(line)
        chatbot.train(conversation)

Please excuse the swearing, its the data I was given.

Edit: Full error

Traceback (most recent call last):
  File "E:/Jason Chatterbot/Jason Chat.py", line 102, in <module>
control()
  File "E:/Jason Chatterbot/Jason Chat.py", line 96, in control
train_from_text()
  File "E:/Jason Chatterbot/Jason Chat.py", line 58, in train_from_text
chatbot.train(conversation)
  File "C:\Python27\lib\site-packages\chatterbot\trainers.py", line 119, in train
corpora = self.corpus.load_corpus(corpus_path)
  File "C:\Python27\lib\site-packages\chatterbot_corpus\corpus.py", line 98, in load_corpus
corpus_data = self.read_corpus(file_path)
  File "C:\Python27\lib\site-packages\chatterbot_corpus\corpus.py", line 63, in read_corpus
with io.open(file_name, encoding='utf-8') as data_file:
IOError: [Errno 22] Invalid argument: 'Matt said you have a "shit load" of dining dollars\\ I have almost none so if you\'re willing to sell, I\'m willing to buy\\\r\n'

Without looking at a larger subset of the data, seems like it's replacing single quotes (') with escaped single quotes (\\'), actual newline characters, with escaped newlines (\\n) and periods with double backslashes (\\)

A simple string replace might fix it for you, depending on how bad the data is getting munged. Try changing

conversation.append(line)

to

conversation.append(line.replace("\\'","'").replace('\\\\','.').replace("\\n","\n"))

We're basically trying to reverse those substitutions that are being made automatically.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM