Python 2to3 不工作

Question

I'm currently going through the python challenge, and i'm up to level 4, see here I have only been learning python for a few months, and i'm trying to learn python 3 over 2.x so far so good, except when i use this bit of code, here's the python 2.x version:我目前正在接受 python 挑战，我已经达到 4 级，请看这里我只学习了几个月的 python，并且我正在尝试学习 python 3 2.x 到目前为止很好，除了当我使用这段代码时，这里是 python 2.x 版本：

import urllib, re
prefix = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing="
findnothing = re.compile(r"nothing is (\d+)").search
nothing = '12345'
while True:
    text = urllib.urlopen(prefix + nothing).read()
    print text
    match = findnothing(text)
    if match:
        nothing = match.group(1)
        print "   going to", nothing
    else:
        break

So to convert this to 3, I would change to this:因此，要将其转换为 3，我将更改为：

import urllib.request, urllib.parse, urllib.error, re
prefix = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing="
findnothing = re.compile(r"nothing is (\d+)").search
nothing = '12345'
while True:
    text = urllib.request.urlopen(prefix + nothing).read()
    print(text)
    match = findnothing(text)
    if match:
        nothing = match.group(1)
        print("   going to", nothing)
    else:
        break

So if i run the 2.x version it works fine, goes through the loop, scraping the url and goes to the end, i get the following output:因此，如果我运行 2.x 版本它工作正常，通过循环，抓取 url 并转到最后，我得到以下输出：

and the next nothing is 72198
   going to 72198
and the next nothing is 80992
   going to 80992
and the next nothing is 8880
   going to 8880 etc

If i run the 3.x version, i get the following output:如果我运行 3.x 版本，我会得到以下输出：

b'and the next nothing is 44827'
Traceback (most recent call last):
  File "C:\Python32\lvl4.py", line 26, in <module>
    match = findnothing(b"text")
TypeError: can't use a string pattern on a bytes-like object

So if i change the r to ab in this line因此，如果我在这一行中将 r 更改为 ab

findnothing = re.compile(b"nothing is (\d+)").search

I get:我得到：

b'and the next nothing is 44827'
   going to b'44827'
Traceback (most recent call last):
  File "C:\Python32\lvl4.py", line 24, in <module>
    text = urllib.request.urlopen(prefix + nothing).read()
TypeError: Can't convert 'bytes' object to str implicitly

Any ideas?有任何想法吗？

I'm pretty new to programming, so please don't bite my head off.我对编程很陌生，所以请不要咬我的头。

_bk201 _bk201

Answer 1

You can't mix bytes and str objects implicitly.您不能隐式地混合 bytes 和 str 对象。

The simplest thing would be to decode bytes returned by urlopen().read() and use str objects everywhere:最简单的方法是解码urlopen().read()返回的字节并在任何地方使用 str 对象：

text = urllib.request.urlopen(prefix + nothing).read().decode() #note: utf-8

The page doesn't specify the preferable character encoding via Content-Type header or <meta> element.该页面未通过Content-Type标头或<meta>元素指定首选字符编码。 I don't know what the default encoding should be for text/html but the rfc 2068 says :我不知道text/html的默认编码应该是什么，但rfc 2068 说：

When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.当发送方没有提供明确的字符集参数时，“文本”类型的媒体子类型被定义为在通过 HTTP 接收时具有“ISO-8859-1”的默认字符集值。

Answer 2

Regular expressions make sense only on text, not on binary data.正则表达式只对文本有意义，对二进制数据没有意义。 So, keep findnothing = re.compile(r"nothing is (\\d+)").search , and convert text to string instead.因此，保留findnothing = re.compile(r"nothing is (\\d+)").search ，并将text转换为字符串。

Answer 3

Instead of urllib we're using requests and it has two options ( which maybe you can search in urllib for similar options )我们使用的是requests而不是urllib ，它有两个选项（也许您可以在 urllib 中搜索类似的选项）

Response object响应对象

import requests
>>> response = requests.get('https://api.github.com')

Using response.content - has the bytes type使用response.content - 具有bytes类型

>>> response.content
b'{"current_user_url":"https://api.github.com/user","current_us...."}'

While using response.text - you have the encoded response使用response.text - 你有编码的响应

>>> response.text
'{"current_user_url":"https://api.github.com/user","current_us...."}'

The default encoding is utf-8 , but you can set it right after the request like so默认编码是utf-8 ，但您可以像这样在请求后立即设置

import requests
>>> response = requests.get('https://api.github.com')
>>> response.encoding = 'SOME_ENCODING'

And then response.text will hold the content in the encoding you requested ...然后response.text将以您请求的编码保存内容......

Python 2to3 不工作

问题描述

3 个解决方案

解决方案1
4 已采纳 2012-02-26 13:03:47

解决方案2
1 2012-02-26 13:04:21

解决方案3
0 2019-11-04 12:34:57

Python 2to3 不工作

问题描述

3 个解决方案

解决方案1 4 已采纳 2012-02-26 13:03:47

解决方案2 1 2012-02-26 13:04:21

解决方案3 0 2019-11-04 12:34:57

解决方案1
4 已采纳 2012-02-26 13:03:47

解决方案2
1 2012-02-26 13:04:21

解决方案3
0 2019-11-04 12:34:57