简体   繁体   English

TypeError:必须为str,而不是IBM Watson中的字节

[英]TypeError: must be str, not bytes in IBM Watson

I just finished the CodeAcademyIBM Watson course, and they programmed in python 2, when I brought the file over in python 3, I kept getting this error. 我刚刚完成了CodeAcademyIBM Watson课程,并且他们使用python 2进行了编程,当我将文件带入python 3中时,我一直收到此错误。 The file script and all the credentials worked fine in CodeAcademy. 文件脚本和所有凭据在CodeAcademy中都可以正常工作。 Is this because I'm working in Python 3, or is it because of an issue in the code. 是因为我正在使用Python 3,还是因为代码中的问题。

    Traceback (most recent call last):
  File "c:\Users\Guppy\Programs\PythonCode\Celebrity Match\CelebrityMatch.py", line 58, in <module>
    user_result = analyze(user_handle)
  File "c:\Users\Guppy\Programs\PythonCode\Celebrity Match\CelebrityMatch.py", line 22, in analyze
    text += status.text.encode('utf-8')
TypeError: must be str, not bytes 

Does anyone know whats wrong, the code is below: 有谁知道出什么问题了,代码如下:

import sys
import operator
import requests
import json
import twitter
from watson_developer_cloud import PersonalityInsightsV2 as PersonalityInsights

def analyze(handle):
    twitter_consumer_key = '<key>'
    twitter_consumer_secret = '<secret>'
    twitter_access_token = '<token>'
    twitter_access_secret = '<secret>'

    twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret)

    statuses = twitter_api.GetUserTimeline(screen_name = handle, count = 200, include_rts = False)

    text = ""

    for status in statuses:
        if (status.lang =='en'): #English tweets
            text += status.text.encode('utf-8')

    #The IBM Bluemix credentials for Personality Insights!
    pi_username = '<username>'
    pi_password = '<password>'

    personality_insights = PersonalityInsights(username=pi_username, password=pi_password)
    pi_result = personality_insights.profile(text)
    return pi_result

def flatten(orig):
    data = {}
    for c in orig['tree']['children']:
        if 'children' in c:
            for c2 in c['children']:
                if 'children' in c2:
                    for c3 in c2['children']:
                        if 'children' in c3:
                            for c4 in c3['children']:
                                if (c4['category'] == 'personality'):
                                    data[c4['id']] = c4['percentage']
                                    if 'children' not in c3:
                                        if (c3['category'] == 'personality'):
                                                data[c3['id']] = c3['percentage']
    return data

def compare(dict1, dict2):
    compared_data = {}
    for keys in dict1:
        if dict1[keys] != dict2[keys]:
                compared_data[keys]=abs(dict1[keys] - dict2[keys])
    return compared_data

user_handle = "@itsguppythegod"
celebrity_handle = "@giselleee_____"

user_result = analyze(user_handle)
celebrity_result = analyze(celebrity_handle)

user = flatten(user_result)
celebrity = flatten(celebrity_result)

compared_results = compare(user, celebrity)

sorted_result = sorted(compared_results.items(), key=operator.itemgetter(1))

for keys, value in sorted_result[:5]:
    print(keys, end = " ")
    print(user[keys], end = " ")
    print ('->', end - " ")
    print (celebrity[keys], end = " ")
    print ('->', end = " ")
    print (compared_results[keys])

You created a str (unicode text) object here: 您在此处创建了一个str (unicode文本)对象:

text = ""

and then proceed to append UTF-8 encoded bytes: 然后继续添加UTF-8编码的字节:

text += status.text.encode('utf-8')

In Python 2, "" created a bytestring and that was all fine (albeit that you are then posting UTF-8 bytes to a service that will interpret it all as Latin-1, see the API documentation . 在Python 2中, ""创建了一个字节字符串,这一切都很好(尽管您随后将UTF-8字节发布到将其全部解释为Latin-1的服务中,请参阅API文档)

To fix this, don't encode the status texts until you are done collecting all the tweets. 要解决此问题, 在完成收集所有推文之前, 不要对状态文本进行编码 In addition, tell Watson to expect UTF-8 data. 此外,告诉沃森使用UTF-8数据。 Last but not least, you should really build a list of twitter texts first and concatenate them in one step later on with str.join() , as concatenating strings in a loop takes quadratic time: 最后但并非最不重要的一点是,您实际上应该首先构建一个Twitter文本列表,并在以后的一步中使用str.join()将它们连接起来,因为在循环中将字符串连接起来需要二次时间:

text = []

for status in statuses:
    if (status.lang =='en'): #English tweets
        text.append(status.text)

# ...

personality_insights = PersonalityInsights(username=pi_username, password=pi_password)
pi_result = personality_insights.profile(
    ' '.join(text).encode('utf8'),
    content_type='text/plain; charset=utf-8'
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM