简体   繁体   English

python csv阅读器+特殊字符

[英]python csv reader + special characters

I am writing script to read a csv file and write the data in a graph using the pygraphml. 我正在编写脚本以读取CSV文件,并使用pygraphml将数据写入图形中。

Issue is that the file first column has some data like this and I am not able to read them. 问题是文件第一列中有一些这样的数据,我无法读取它们。

Master Muppet ™ joèl b Kýrie, eléison 大师布偶™joèlbKýrie,Eléison

This is my python script 这是我的python脚本

import csv
import sys
from pygraphml import Graph
from pygraphml import GraphMLParser

#reload(sys)
#sys.setdefaultencoding("utf8")

data = []  # networkd data to write
g = Graph() # graph for networks

#Open File and retrive the target rows
with open(r"C:\Users\csvlabuser\Downloads\test.csv","r") as fp:
    reader = csv.reader(fp)
    unread_count = 2
    completed_list = []

    try:
        for rows in reader:
            if "tweeter_id" == rows[2]:  # skip and check the header
                print("tweeter_id column found")
                continue
            #if rows[2] not in completed_list:                    
            n = g.add_node(rows[2].encode("utf8"))
            completed_list.append(rows[2])
            n['username'] = rows[0].encode("utf8")
            n['userid'] = rows[1]
            if rows[3] != "NULL":   # edges exist only when there is retweets id
                g.add_edge_by_label(rows[2], rows[3])


            print unread_count
            unread_count +=1

    except:
        pass

fp.close()
print unread_count

g.show()
# Write the graph into graphml file format
parser = GraphMLParser()
parser.write(g, "myGraph.graphml")

Kindly let me know where is the issue. 请让我知道问题出在哪里。

Thanks in advance. 提前致谢。

The Python 2 csv module cannot handle unicode input or input containing NUL bytes (see the note at the top of the module page ). Python 2 csv模块无法处理unicode输入或包含NUL字节的输入(请参见模块页面顶部的注释)。 Since you're using print as a keyword rather than a function, I'm guessing you're using Python 2. To use csv with Unicode in Python 2, you must convert to UTF-8 encoding. 由于您使用print作为关键字而不是函数,因此我猜您使用的是Python2。要在Python 2中将csv与Unicode一起使用,必须转换为UTF-8编码。

The csv module's Examples section contains definitions for wrappers ( UTF8Recoder , UnicodeReader , UnicodeWriter ) that allow you to parse inputs in arbitrary encodings, seamlessly fixing up encodings so csv can process the inputs, then decoding back to Python unicode objects (that represent the text as "pure" Unicode text, not a specific byte encoding). csv模块的“示例”部分包含包装程序的定义( UTF8RecoderUnicodeReaderUnicodeWriter ),这些定义使您可以解析任意编码的输入,无缝固定编码,以便csv可以处理输入,然后解码回Python unicode对象(将文本表示为“纯” Unicode文本,而不是特定的字节编码)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM