簡體 English 中英

將python string.split（）與utf-8編碼一起使用

[英]use python string.split() with a line of utf-8 encoding

原文 2014-03-14 16:57:01 9 1 python/ encoding/ utf-8/ split/ tokenize

我有一個utf-8編碼的文本文件，我想使用split作為簡單的標記生成器標記每一行。 代碼如下：

import codecs
file = codecs.open(fileAddress, 'r', 'utf-8')
line = file.readline()
file.close()
line.split()

這不會像我在ascii文件上使用的那樣拆分utf-8字符串。 我希望使用utf-8編碼的“ hi i am here”這樣的行成為令牌列表，例如[“ hi”，“ i”，“ am”，“ here”]]，使用ascii可以很容易地使用該行。分裂（）。

是否有解決此問題的簡單方法？

1 個解決方案

正如Martijn Pieters指出的那樣，只要您的文件具有規則的空格作為分隔符，您的代碼就可以正常工作。 與您期望的結果的唯一區別是令牌將是unicode類型而不是str類型。

還有其他一些用於表示空格的unicode字符http://en.wikipedia.org/wiki/Whitespace_character#Unicode ，這可能會造成混亂，即使是這種情況，即使是readline也可能會出現問題...

（Python）使用UTF-8編碼將字符串寫入CSV

[英](Python) Write string to CSV with UTF-8 Encoding

python編碼utf-8

[英]python encoding utf-8

在 Python 中編碼 utf-8

[英]encoding utf-8 in Python

在Python中將utf-8字符串拆分為字節

[英]split utf-8 string into bytes in python

Python使用utf-8編碼逐行讀取大文件

[英]Python read huge file line by line with utf-8 encoding

用 String.split() 設計 Python

[英]Mastermind Python with String.split()

python string.split() 和循環

[英]python string.split() and loops

string.split錯誤？蟒蛇

[英]string.split error? python

如何通過Python 3中作為命令行參數提供的轉義序列拆分UTF-8字符串？

[英]How to split an UTF-8 string by an escape sequence provided as command line argument in Python 3?

我將如何使用string.split識別Python中的參數

[英]How would I use string.split to recognize parameters in Python

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 （Python）使用UTF-8編碼將字符串寫入CSV python編碼utf-8 在 Python 中編碼 utf-8 在Python中將utf-8字符串拆分為字節 Python使用utf-8編碼逐行讀取大文件用 String.split() 設計 Python python string.split() 和循環 string.split錯誤？蟒蛇如何通過Python 3中作為命令行參數提供的轉義序列拆分UTF-8字符串？我將如何使用string.split識別Python中的參數

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM