简体   繁体   English

Python 2to3 不工作

[英]Python 2to3 not working

I'm currently going through the python challenge, and i'm up to level 4, see here I have only been learning python for a few months, and i'm trying to learn python 3 over 2.x so far so good, except when i use this bit of code, here's the python 2.x version:我目前正在接受 python 挑战,我已经达到 4 级,请看这里我只学习了几个月的 python,并且我正在尝试学习 python 3 2.x 到目前为止很好,除了当我使用这段代码时,这里是 python 2.x 版本:

import urllib, re
prefix = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing="
findnothing = re.compile(r"nothing is (\d+)").search
nothing = '12345'
while True:
    text = urllib.urlopen(prefix + nothing).read()
    print text
    match = findnothing(text)
    if match:
        nothing = match.group(1)
        print "   going to", nothing
    else:
        break

So to convert this to 3, I would change to this:因此,要将其转换为 3,我将更改为:

import urllib.request, urllib.parse, urllib.error, re
prefix = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing="
findnothing = re.compile(r"nothing is (\d+)").search
nothing = '12345'
while True:
    text = urllib.request.urlopen(prefix + nothing).read()
    print(text)
    match = findnothing(text)
    if match:
        nothing = match.group(1)
        print("   going to", nothing)
    else:
        break

So if i run the 2.x version it works fine, goes through the loop, scraping the url and goes to the end, i get the following output:因此,如果我运行 2.x 版本它工作正常,通过循环,抓取 url 并转到最后,我得到以下输出:

and the next nothing is 72198
   going to 72198
and the next nothing is 80992
   going to 80992
and the next nothing is 8880
   going to 8880 etc

If i run the 3.x version, i get the following output:如果我运行 3.x 版本,我会得到以下输出:

b'and the next nothing is 44827'
Traceback (most recent call last):
  File "C:\Python32\lvl4.py", line 26, in <module>
    match = findnothing(b"text")
TypeError: can't use a string pattern on a bytes-like object

So if i change the r to ab in this line因此,如果我在这一行中将 r 更改为 ab

findnothing = re.compile(b"nothing is (\d+)").search

I get:我得到:

b'and the next nothing is 44827'
   going to b'44827'
Traceback (most recent call last):
  File "C:\Python32\lvl4.py", line 24, in <module>
    text = urllib.request.urlopen(prefix + nothing).read()
TypeError: Can't convert 'bytes' object to str implicitly

Any ideas?有任何想法吗?

I'm pretty new to programming, so please don't bite my head off.我对编程很陌生,所以请不要咬我的头。

_bk201 _bk201

You can't mix bytes and str objects implicitly.您不能隐式地混合 bytes 和 str 对象。

The simplest thing would be to decode bytes returned by urlopen().read() and use str objects everywhere:最简单的方法是解码urlopen().read()返回的字节并在任何地方使用 str 对象:

text = urllib.request.urlopen(prefix + nothing).read().decode() #note: utf-8

The page doesn't specify the preferable character encoding via Content-Type header or <meta> element.该页面未通过Content-Type标头或<meta>元素指定首选字符编码。 I don't know what the default encoding should be for text/html but the rfc 2068 says :我不知道text/html的默认编码应该是什么,但rfc 2068 说

When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP.当发送方没有提供明确的字符集参数时,“文本”类型的媒体子类型被定义为在通过 HTTP 接收时具有“ISO-8859-1”的默认字符集值。

Regular expressions make sense only on text, not on binary data.正则表达式只对文本有意义,对二进制数据没有意义。 So, keep findnothing = re.compile(r"nothing is (\\d+)").search , and convert text to string instead.因此,保留findnothing = re.compile(r"nothing is (\\d+)").search ,并将text转换为字符串。

Instead of urllib we're using requests and it has two options ( which maybe you can search in urllib for similar options )我们使用的是requests而不是urllib ,它有两个选项(也许您可以在 urllib 中搜索类似的选项)

Response object响应对象

import requests
>>> response = requests.get('https://api.github.com')

Using response.content - has the bytes type使用response.content - 具有bytes类型

>>> response.content
b'{"current_user_url":"https://api.github.com/user","current_us...."}'

While using response.text - you have the encoded response使用response.text - 你有编码的响应

>>> response.text
'{"current_user_url":"https://api.github.com/user","current_us...."}'

The default encoding is utf-8 , but you can set it right after the request like so默认编码是utf-8 ,但您可以像这样在请求后立即设置

import requests
>>> response = requests.get('https://api.github.com')
>>> response.encoding = 'SOME_ENCODING'

And then response.text will hold the content in the encoding you requested ...然后response.text将以您请求的编码保存内容......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM