在pandas中如何讀取列中列表的csv文件？

Question

我有一個csv文件，其中一些列如下所示：

df = pd.DataFrame({'a':[['ID1','ID2','ID3'],['ID1','ID4'],[]],'b':[[8.6,1.3,2.5],[7.5,1.2],[]],'c':[[12,23,79],[42,10],[]]})

Out[1]:     a               b                c
        0   [ID1, ID2, ID3] [8.6, 1.3, 2.5] [12, 23, 79]
        1   [ID1, ID4]      [7.5, 1.2]      [42, 10]
        2   []              []              []

問題在於，當我用pandas.read_csv讀取它時，Python pandas.read_csv這些列視為字符串。 有沒有辦法作為選項傳遞它是這些列中的數字列表？ （也許有些dtype = something ）

PS：之后我可以用ast.literal_eval做一個列表理解，但是需要一段時間，所以我想在閱讀csv后立即擁有它。

PS2：原始的csv文件長度為600 000行（這就是為什么需要一些時間來使用literal_eval 。它的列包含：

'ID of the project'  'postcode'    'city'       'len of the lists in the last 3 columns'  'ids of other projects'   'distance from initial project'  'jetlag from initial project'
 object                int          string       int                                       list of strings           list of floats                   list of ints

Answer 1

為此，您可以使用pd.read_csv函數中的converters （ read_csv的文檔：

用你的例子，

'ID of the project'  'postcode'    'city'       'len of the lists in the last 3 columns'  'ids of other projects'   'distance from initial project'  'jetlag from initial project'
 object                int          string       int                                       list of strings           list of floats                   list of ints

它可以這樣做：

import pandas as pd
import ast
generic = lambda x: ast.literal_eval(x)
conv = {'ids of other projects': generic,
        'distance from initial project': generic,
        'jetlag from initial project': generic}

df = pd.read_csv('your_file.csv', converters=conv)

您必須定義要使用轉換的列，但在您的情況下這不應該是一個問題。

轉換器函數將在csv導入期間應用，如果文件太大，您可以始終以塊的形式讀取csv。

在pandas中如何讀取列中列表的csv文件？

問題描述

1 個解決方案

解決方案1
0 2016-06-23 15:05:46

在pandas中如何讀取列中列表的csv文件？

問題描述

1 個解決方案

解決方案1 0 2016-06-23 15:05:46

解決方案1
0 2016-06-23 15:05:46