简体   繁体   English

使用Python从字符串中删除度数符号

[英]Remove a degree symbol from a string using Python

I am using Python to read a text file of data line by line. 我正在使用Python逐行读取数据的文本文件。 One of the lines contains a degree symbol. 其中一行包含度数符号。 I want to alter this part of the string. 我想改变字符串的这一部分。 My script uses line = line.replace("TEMP [°C]", "TempC") . 我的脚本使用line = line.replace("TEMP [°C]", "TempC") My code stops at this line but does not change the sting at all nor does it throw an error. 我的代码在此行停止,但根本不会更改sting,也不会引发错误。 Clearly there is something about my replace such that the script does not see the 'TEMP [°C]' as existing in my string. 很明显,我的替换有一些东西,脚本没有看到我的字符串中存在的'TEMP [°C]'。

In order to insert the degree sign in my script I had to change the encoding to UTF-8 in my IDE file settings. 为了在我的脚本中插入度数符号,我必须在IDE文件设置中将编码更改为UTF-8。 I have included the following text at the top of my script. 我在脚本的顶部包含了以下文本。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

How do I replace 'TEMP [°C]' with 'TempC'? 如何用'TempC'替换'TEMP [°C]'?

I am using Windows 7 and Python 2.7 with Komodo IDE 5.2 我正在使用Windows 7和Python 2.7与Komodo IDE 5.2

I have tried running the suggested code in a Python Shell in Komodo and that changed the file. 我尝试在Komodo的Python Shell中运行建议的代码并更改了文件。

# -*- coding: utf-8 -*-
line = "hello TEMP [°C]"
line = line.replace("TEMP [°C]", "TempC")
print(line)
hello TempC

This suggested code in a Python Shell in Komodo returned this. 这个在Komodo的Python Shell中建议的代码返回了这个。

line = "TEMP [°C]"
line = line.replace(u"TEMP [°C]", "TempC")
Traceback (most recent call last):
File "<console>", line 0, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 6: ordinal not in range(128)

None of these suggestions worked when reading my text file though. 这些建议在阅读我的文本文件时都不起作用。

Based on your symptoms, your Python str literals end up as their utf-8 encodings, so when you type: 根据您的症状,您的Python str文字最终将作为其utf-8编码,因此当您键入时:

"TEMP [°C]"

you actually get: 你真的得到:

'TEMP [\xc2\xb0C]'

Your file is some other encoding (eg latin-1 or cp1252 ), and since you're reading it via plain open , you're getting back undecoded str . 你的文件是一些其他的编码(例如latin-1cp1252 ),并且因为你是通过普通open来阅读它,所以你得到了未解码的str But in latin-1 and cp1252 encoding, the str is 'TEMP [\\xb0C]' (note lack of \\xc2 ), so str comparison doesn't consider the two strings equivalent. 但是在latin-1cp1252编码中, str'TEMP [\\xb0C]' (注意缺少\\xc2 ),所以str比较不考虑两个字符串等价。

The best fix is to replace your use of open with io.open , which uses the Python 3 version of open that can seamlessly decode using a given encoding to produce canonical unicode representations, and similarly, to use unicode literals instead of str in (to Python) unknown encoding, so there is no disagreement on the correct way to represent a degree symbol (in unicode , there is one, and only one, representation): 最好的解决方法是更换您的使用openio.open ,它使用了Python 3.0版本的open ,可以无缝使用解码给定的编码,产生典型unicode表示,同样,使用unicode ,而不是文字str中(以Python)未知编码,因此对表示度数符号的正确方法没有异议(在unicode ,只有一个,只有一个表示):

import io

with io.open('myfile.txt', encoding='cp1252') as f:
    for line in f:
        line = line.replace(u"TEMP [°C]", u"TempC")

As you describe in your edits, your file is likely cp1252 (your editor says it's ANSI, which is just a dumb way to describe cp1252 ), thus the chosen encoding . 正如您在编辑中描述的那样,您的文件可能是cp1252 (您的编辑器说它是ANSI, 这只是描述cp1252一种愚蠢方式 ),因此选择了encoding

Note: If you're going to use unicode consistently throughout your program (a decent idea if you deal with non-ASCII data), you can make that the default: 注意:如果您要在整个程序中一致地使用unicode (如果处理非ASCII数据,这是一个不错的主意),您可以将其设为默认值:

from __future__ import unicode_literals
# All string literals are unicode literals unless prefixed with b, as on Python 2

from io import open  # open is now Python 3's open

# No need to qualify with `io.` for `open`, nor put `u` in front of Unicode text
with open('myfile.txt', encoding='cp1252') as f:
    for line in f:
        line = line.replace("TEMP [°C]", "TempC")

Really you should just move to Python 3, where this whole " unicode and str try to work together and often fail" thing was resolved by splitting the two types completely. 真的,你应该转移到Python 3,其中整个“ unicodestr尝试一起工作并经常失败”的事情通过完全拆分这两种类型来解决。

您应该使用u标志作为unicode字符串文字:

line = line.replace(u"TEMP [°C]", "TempC")

This code is working fine for me (Python 2.7.14). 这段代码对我来说很好(Python 2.7.14)。 Maybe you can point out whether you did something different, so we can take it from there. 也许你可以指出你是否做了不同的事情,所以我们可以从那里开始。

# -*- coding: utf-8 -*-

line = "hello TEMP [°C]"
line = line.replace("TEMP [°C]", "TempC")

print(line)
# hello TempC

Note: For me no u flag was necessary. 注意:对我来说,不需要你的旗帜。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM