简体   繁体   English

如何从python中的csv读取编码字符串的数据帧

[英]How to read a dataframe of encoded strings from csv in python

Suppose I read an html website and I get a list of names, such as: 'Amiel, Henri-Frédéric'. 假设我读了一个html网站,我得到了一个名单,例如:'Amiel,Henri-Frédéric'。

In order to get the list of names I decode the html using the following code: 为了获取名称列表,我使用以下代码解码html:

f = urllib.urlopen("http://xxx.htm")
html = f.read()
html=html.decode('utf8')
t.feed(html)
t.close()
lista=t.data

At this point, the variable lista contains a list of names like: 此时,变量lista包含一个名称列表,如:

[u'Abatantuono, Diego', ... , u'Amiel, Henri-Frédéric'] [u'Abatantuono,Diego',...,u'Amiel,Henri-Frédéric']

Now I would like to: 现在我想:

  1. put these names inside a DataFrame; 将这些名称放在DataFrame中;
  2. save the DataFrame in a csv file; 将DataFrame保存在csv文件中;
  3. read the csv in Python through a DataFrame 通过DataFrame读取Python中的csv

For simplicity, let's take in consideration just the above name to complete steps 1 to 3. I would use the following code: 为简单起见,我们只考虑上面的名称来完成步骤1到3.我将使用以下代码:

name=u'Amiel, Henri-Fr\xe9d\xe9ric'
name=name.encode('utf8')
array=[name]
df=pd.DataFrame({'Names':array})
df.to_csv('names')
uni=pd.read_csv('names')
uni #trying to read the csv file in a DataFrame

At this point i get the following error: 此时我收到以下错误:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 67: invalid continuation byte      

If I substitute the last row of the above code with: 如果我用以下代码替换上面代码的最后一行:

print uni

I can read the DataFrame but I don't think it is the right way to handle this issue. 我可以阅读DataFrame,但我不认为这是处理这个问题的正确方法。

I red many questions posted by other users about this argument but I didn't get to solve this one. 我向其他用户发布了很多关于这个论点的问题,但我没有解决这个问题。

Both to_csv method and read_csv function take an encoding argument. to_csv方法和read_csv函数都采用encoding参数。 Use it. 用它。 And work with unicode internally. 并在内部使用unicode。 If you don't, trying to encode/decode inside your program will get you . 如果不这样做,尝试在程序中编码/解码将会得到你

import pandas as pd

name = u'Amiel, Henri-Fr\xe9d\xe9ric'
array = [name]
df = pd.DataFrame({'Names':array})
df.to_csv('names', encoding='utf-8')
uni = pd.read_csv('names', index_col = [0], encoding='utf-8')
print uni  # for me it works with or without print

out: 出:

                   Names
0  Amiel, Henri-Frédéric

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何停止Python read_csv数据框转换日期 - How to stop Python read_csv Dataframe from converting Dates 如何使用 python 和数据框从 csv 文件中读取动态数据 - How to read a dynamic data from csv file using python and dataframe 从python中的.csv文件读取字符串 - read strings from .csv file in python python-列表以字符串形式从csv中读取 - python - lists are being read in from csv as strings Python:从csv文件中读取数据帧列表 - Python: Read list of dataframe from csv file 如何在 Python 中读取 csv 文件(带有特殊字符)? 如何解码文本数据? 从文件中读取编码文本并转换为字符串 - How to read csv files (with special characters) in Python? How can I decode the text data? Read encoded text from file and convert to string 如何使用 pandas 将度分秒 (DMS) 数据直接从 a.CSV 文件读取到 dataframe 作为字符串? - How to read Degree Minute Seconds (DMS) data directly from a .CSV file using pandas into a dataframe as strings? 将列表写入pandas dataframe到csv,从csv读取dataframe并再次转换为列表而没有字符串 - write lists to pandas dataframe to csv, read dataframe from csv and convert to lists again without having strings Python - 将 CSV 读取为字符串列表 - Python - Read CSV as List of Strings Python-读取CSV并输出到字符串 - Python - Read CSV and output to strings
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM