[英]python csv reader + special characters
I am writing script to read a csv file and write the data in a graph using the pygraphml. 我正在编写脚本以读取CSV文件,并使用pygraphml将数据写入图形中。
Issue is that the file first column has some data like this and I am not able to read them. 问题是文件第一列中有一些这样的数据,我无法读取它们。
Master Muppet ™ joèl b Kýrie, eléison 大师布偶™joèlbKýrie,Eléison
This is my python script 这是我的python脚本
import csv
import sys
from pygraphml import Graph
from pygraphml import GraphMLParser
#reload(sys)
#sys.setdefaultencoding("utf8")
data = [] # networkd data to write
g = Graph() # graph for networks
#Open File and retrive the target rows
with open(r"C:\Users\csvlabuser\Downloads\test.csv","r") as fp:
reader = csv.reader(fp)
unread_count = 2
completed_list = []
try:
for rows in reader:
if "tweeter_id" == rows[2]: # skip and check the header
print("tweeter_id column found")
continue
#if rows[2] not in completed_list:
n = g.add_node(rows[2].encode("utf8"))
completed_list.append(rows[2])
n['username'] = rows[0].encode("utf8")
n['userid'] = rows[1]
if rows[3] != "NULL": # edges exist only when there is retweets id
g.add_edge_by_label(rows[2], rows[3])
print unread_count
unread_count +=1
except:
pass
fp.close()
print unread_count
g.show()
# Write the graph into graphml file format
parser = GraphMLParser()
parser.write(g, "myGraph.graphml")
Kindly let me know where is the issue. 请让我知道问题出在哪里。
Thanks in advance. 提前致谢。
The Python 2 csv
module cannot handle unicode
input or input containing NUL
bytes (see the note at the top of the module page ). Python 2
csv
模块无法处理unicode
输入或包含NUL
字节的输入(请参见模块页面顶部的注释)。 Since you're using print
as a keyword rather than a function, I'm guessing you're using Python 2. To use csv
with Unicode in Python 2, you must convert to UTF-8
encoding. 由于您使用
print
作为关键字而不是函数,因此我猜您使用的是Python2。要在Python 2中将csv
与Unicode一起使用,必须转换为UTF-8
编码。
The csv
module's Examples section contains definitions for wrappers ( UTF8Recoder
, UnicodeReader
, UnicodeWriter
) that allow you to parse inputs in arbitrary encodings, seamlessly fixing up encodings so csv
can process the inputs, then decoding back to Python unicode
objects (that represent the text as "pure" Unicode text, not a specific byte encoding). csv
模块的“示例”部分包含包装程序的定义( UTF8Recoder
, UnicodeReader
, UnicodeWriter
),这些定义使您可以解析任意编码的输入,无缝固定编码,以便csv
可以处理输入,然后解码回Python unicode
对象(将文本表示为“纯” Unicode文本,而不是特定的字节编码)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.