如何从 csv 文件中读取用双引号括起来的列表？

Question

I am creating a ticket tracking project where I have a pandas dataframe holding the ticket information.我正在创建一个票证跟踪项目，其中有一个 pandas dataframe 持有票证信息。 I am then storing this dataframe into a csv file.然后我将此 dataframe 存储到 csv 文件中。 The dataframe will initialize at the start of the program. dataframe 将在程序开始时进行初始化。

One of the column values is a list.列值之一是列表。 When you store the pandas dataframe in a csv file with this line of code: self.ticketDF.to_csv(self.ticketCSVFilePath) , it surrounds the list in double quotes.当您使用以下代码self.ticketDF.to_csv(self.ticketCSVFilePath) pandas dataframe 存储在 csv 文件中时： When you then read it back in, it now is interpreted as a string, not a list.然后，当您将其读回时，它现在被解释为字符串，而不是列表。 In my example, you can see the list with double quotes under the comments column.在我的示例中，您可以在评论列下看到带有双引号的列表。

Here is my file - tickets.csv :这是我的文件-tickets.csv ：

Ticket ID,Subject,Project,Description,Priority,Comments
PROT-18,testSubject,testProject,testDescription,testPriority,"['comment1', 'comment2', 'comment3']"
PROT-19,testSubject,testProject,testDescription,testPriority,"['comment4', 'comment5', 'comment6']"

I am initializing the pandas dataframe, using these two functions:我正在初始化 pandas dataframe，使用这两个函数：

def initializeTicketDF(self):
   if path.exists(self.ticketCSVFilePath) and path.getsize(self.ticketCSVFilePath) > 0:
       self.ticketDF = pd.read_csv(self.ticketCSVFilePath)  #reading the csv file into the dataframe
       self.ticketDF.set_index('Ticket ID', inplace=True)   #I am setting the index to the Ticket ID
       self.columnToList("Comments")                        #Calling my function that currently does the 
                                                            #workaround

def columnToList(self, columnName):
   count = 0                                #this represents the current row in the dataframe
   for x in self.ticketDF['Comments']:      #x holds the "Comments" value for every row
       x = x.replace('[', '')               #replace left and right brackets
       x = x.replace(']', '')
       x = re.findall('\'([^\']*)\'', x)    #get a list of all values between single quotes
       self.ticketDF[columnName][count] = x #store the list back into the dataframe
       count += 1

To work around this issue as shown above, I am replacing each bracket separately and then getting a list of all values between single quotations with this line: x = re.findall('\'([^\']*)\'', x) I am then storing the list back into the dataframe row by row.如上所示，要解决此问题，我将分别替换每个括号，然后使用以下行获取单引号之间所有值的列表： x = re.findall('\'([^\']*)\'', x)然后我将列表逐行存储回 dataframe 中。

I have also tried using csv.DictReader/Writer and it does the same thing.我也尝试过使用 csv.DictReader/Writer 并且它做同样的事情。

Is there a way to read the list in the csv without having to do any string modifications?有没有办法读取 csv 中的列表而无需进行任何字符串修改？ Is there a regular expression I could use to clean up the list's string?有没有可以用来清理列表字符串的正则表达式？

Any thoughts would be greatly appreciated.任何想法将不胜感激。 Thanks!谢谢！

Answer 1

You can pass a converted for a column to pd.read_csv() :您可以将转换后的列传pd.read_csv() ：

import pandas as pd
from ast import literal_eval

p = pd.read_csv(path, converters={'Comments':literal_eval})

p['Comments']
# 0    [comment1, comment2, comment3]
# 1    [comment4, comment5, comment6]

p['Comments'][0][1]
# 'comment2'

literal_eval will safely evaluate simple expressions like your list. literal_eval将安全地评估像您的列表这样的简单表达式。

如何从 csv 文件中读取用双引号括起来的列表？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-31 00:44:19

如何从 csv 文件中读取用双引号括起来的列表？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-31 00:44:19

解决方案1
1 已采纳 2020-05-31 00:44:19