简体   繁体   English

如何从 csv 文件中读取用双引号括起来的列表?

[英]How do I read a list surrounded in double quotes from a csv file?

I am creating a ticket tracking project where I have a pandas dataframe holding the ticket information.我正在创建一个票证跟踪项目,其中有一个 pandas dataframe 持有票证信息。 I am then storing this dataframe into a csv file.然后我将此 dataframe 存储到 csv 文件中。 The dataframe will initialize at the start of the program. dataframe 将在程序开始时进行初始化。

One of the column values is a list.列值之一是列表。 When you store the pandas dataframe in a csv file with this line of code: self.ticketDF.to_csv(self.ticketCSVFilePath) , it surrounds the list in double quotes.当您使用以下代码self.ticketDF.to_csv(self.ticketCSVFilePath) pandas dataframe 存储在 csv 文件中时: When you then read it back in, it now is interpreted as a string, not a list.然后,当您将其读回时,它现在被解释为字符串,而不是列表。 In my example, you can see the list with double quotes under the comments column.在我的示例中,您可以在评论列下看到带有双引号的列表。

Here is my file - tickets.csv :这是我的文件-tickets.csv

Ticket ID,Subject,Project,Description,Priority,Comments
PROT-18,testSubject,testProject,testDescription,testPriority,"['comment1', 'comment2', 'comment3']"
PROT-19,testSubject,testProject,testDescription,testPriority,"['comment4', 'comment5', 'comment6']"

I am initializing the pandas dataframe, using these two functions:我正在初始化 pandas dataframe,使用这两个函数:

def initializeTicketDF(self):
   if path.exists(self.ticketCSVFilePath) and path.getsize(self.ticketCSVFilePath) > 0:
       self.ticketDF = pd.read_csv(self.ticketCSVFilePath)  #reading the csv file into the dataframe
       self.ticketDF.set_index('Ticket ID', inplace=True)   #I am setting the index to the Ticket ID
       self.columnToList("Comments")                        #Calling my function that currently does the 
                                                            #workaround

def columnToList(self, columnName):
   count = 0                                #this represents the current row in the dataframe
   for x in self.ticketDF['Comments']:      #x holds the "Comments" value for every row
       x = x.replace('[', '')               #replace left and right brackets
       x = x.replace(']', '')
       x = re.findall('\'([^\']*)\'', x)    #get a list of all values between single quotes
       self.ticketDF[columnName][count] = x #store the list back into the dataframe
       count += 1 

To work around this issue as shown above, I am replacing each bracket separately and then getting a list of all values between single quotations with this line: x = re.findall('\'([^\']*)\'', x) I am then storing the list back into the dataframe row by row.如上所示,要解决此问题,我将分别替换每个括号,然后使用以下行获取单引号之间所有值的列表: x = re.findall('\'([^\']*)\'', x)然后我将列表逐行存储回 dataframe 中。

I have also tried using csv.DictReader/Writer and it does the same thing.我也尝试过使用 csv.DictReader/Writer 并且它做同样的事情。

Is there a way to read the list in the csv without having to do any string modifications?有没有办法读取 csv 中的列表而无需进行任何字符串修改? Is there a regular expression I could use to clean up the list's string?有没有可以用来清理列表字符串的正则表达式?

Any thoughts would be greatly appreciated.任何想法将不胜感激。 Thanks!谢谢!

You can pass a converted for a column to pd.read_csv() :您可以将转换后的列传pd.read_csv()

import pandas as pd
from ast import literal_eval

p = pd.read_csv(path, converters={'Comments':literal_eval})

p['Comments']
# 0    [comment1, comment2, comment3]
# 1    [comment4, comment5, comment6]

p['Comments'][0][1]
# 'comment2'

literal_eval will safely evaluate simple expressions like your list. literal_eval将安全地评估像您的列表这样的简单表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM