[英]How do I read a list surrounded in double quotes from a csv file?
I am creating a ticket tracking project where I have a pandas dataframe holding the ticket information.我正在创建一个票证跟踪项目,其中有一个 pandas dataframe 持有票证信息。 I am then storing this dataframe into a csv file.然后我将此 dataframe 存储到 csv 文件中。 The dataframe will initialize at the start of the program. dataframe 将在程序开始时进行初始化。
One of the column values is a list.列值之一是列表。 When you store the pandas dataframe in a csv file with this line of code: self.ticketDF.to_csv(self.ticketCSVFilePath)
, it surrounds the list in double quotes.当您使用以下代码self.ticketDF.to_csv(self.ticketCSVFilePath)
pandas dataframe 存储在 csv 文件中时: When you then read it back in, it now is interpreted as a string, not a list.然后,当您将其读回时,它现在被解释为字符串,而不是列表。 In my example, you can see the list with double quotes under the comments column.在我的示例中,您可以在评论列下看到带有双引号的列表。
Here is my file - tickets.csv :这是我的文件-tickets.csv :
Ticket ID,Subject,Project,Description,Priority,Comments
PROT-18,testSubject,testProject,testDescription,testPriority,"['comment1', 'comment2', 'comment3']"
PROT-19,testSubject,testProject,testDescription,testPriority,"['comment4', 'comment5', 'comment6']"
I am initializing the pandas dataframe, using these two functions:我正在初始化 pandas dataframe,使用这两个函数:
def initializeTicketDF(self):
if path.exists(self.ticketCSVFilePath) and path.getsize(self.ticketCSVFilePath) > 0:
self.ticketDF = pd.read_csv(self.ticketCSVFilePath) #reading the csv file into the dataframe
self.ticketDF.set_index('Ticket ID', inplace=True) #I am setting the index to the Ticket ID
self.columnToList("Comments") #Calling my function that currently does the
#workaround
def columnToList(self, columnName):
count = 0 #this represents the current row in the dataframe
for x in self.ticketDF['Comments']: #x holds the "Comments" value for every row
x = x.replace('[', '') #replace left and right brackets
x = x.replace(']', '')
x = re.findall('\'([^\']*)\'', x) #get a list of all values between single quotes
self.ticketDF[columnName][count] = x #store the list back into the dataframe
count += 1
To work around this issue as shown above, I am replacing each bracket separately and then getting a list of all values between single quotations with this line: x = re.findall('\'([^\']*)\'', x)
I am then storing the list back into the dataframe row by row.如上所示,要解决此问题,我将分别替换每个括号,然后使用以下行获取单引号之间所有值的列表: x = re.findall('\'([^\']*)\'', x)
然后我将列表逐行存储回 dataframe 中。
I have also tried using csv.DictReader/Writer and it does the same thing.我也尝试过使用 csv.DictReader/Writer 并且它做同样的事情。
Is there a way to read the list in the csv without having to do any string modifications?有没有办法读取 csv 中的列表而无需进行任何字符串修改? Is there a regular expression I could use to clean up the list's string?有没有可以用来清理列表字符串的正则表达式?
Any thoughts would be greatly appreciated.任何想法将不胜感激。 Thanks!谢谢!
You can pass a converted for a column to pd.read_csv()
:您可以将转换后的列传pd.read_csv()
:
import pandas as pd
from ast import literal_eval
p = pd.read_csv(path, converters={'Comments':literal_eval})
p['Comments']
# 0 [comment1, comment2, comment3]
# 1 [comment4, comment5, comment6]
p['Comments'][0][1]
# 'comment2'
literal_eval
will safely evaluate simple expressions like your list. literal_eval
将安全地评估像您的列表这样的简单表达式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.