简体   繁体   中英

Why Python2 and Python3 treat same windows directory differently?

My windows language is Chinese. To illustrate my point, I use package pathlib .

from pathlib import *
rootdir=Path(r'D:\新建文件夹')
print(rootdir.exists())

Python2.7 I get False

Python3 I get True

Any ideas?Thanks for any advice.

For Python2.7,you can install pathlib with " pip install pathlib "

In Python 3 strings are Unicode by default. In Python 2, they are byte strings encoded in the source file encoding. Use a Unicode string in Python 2.

Also make sure to declare the source file encoding and make sure the source is saved in that encoding.

#coding:utf8
from pathlib import *
rootdir=Path(ur'D:\新建文件夹')
print(rootdir.exists())

The main difference between Python 2 and Python 3 is the basic types that exist to deal with texts and bytes. On Python 3 we have one text type: str which holds Unicode data and two byte types bytes and bytearray .

On the other hand on Python 2 we have two text types: str which for all intents and purposes is limited to ASCII + some undefined data above the 7 bit range, unicode which is equivalent to the Python 3 str type and one byte type bytearray which it inherited from Python 3.

Python 3 removed all codecs that don't go from bytes to Unicode or vice versa and removed the now useless .encode() method on bytes and .decode() method on strings.

More about this eg here .

Use Unicode literals for Windows paths: add from __future__ import unicode_literals at the top.

Explanation

  1. r'D:\\新建文件夹' is a bytestring on Python 2. Its specific value depends on the encoding declaration at the top (such as # -*- coding: utf-8 -*- ). You should get an error without the declaration if you use non-ascii literal in Python 2. r'D:\\新建文件夹' is a Unicode string on Python 3 and the default source code encoding is utf-8 (no encoding declaration is required)
  2. Python uses Unicode API when working with files on Windows if the input is Unicode and "ANSI" API if the input is bytes.

If the source code encoding differs from "ANSI" encoding (such as cp1252) then the result may differ: the bytes are passed as is (the same byte-sequence can represent different characters in different encodings). If the filename can't be represented in "ANSI" encoding (eg, cp1252 -- a single byte encoding can't represent all Unicode characters -- there are around a million Unicode characters but only 256 bytes); the results may differ. Using Unicode strings for filenames on Windows fixes both issues.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM