基于列表的csv切片pandas DataFrame

Question

I have a text file in the following format. 我有一个以下格式的文本文件。

[1,2]
[3]
[4,5,6,7,10]

And I have a pandas DataFrame like following. 我有一个像下面这样的pandas DataFrame 。

df = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
                'path'  : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p2,p3,p4"]})

output: 输出：

   id         path
0   1  p1,p2,p3,p4
1   2     p1,p2,p1
2   3  p1,p5,p5,p7
3   4  p1,p2,p3,p3
4   5        p1,p2
5   6           p1
6   7     p2,p3,p4

I want to slice the DataFrame based on the text file. 我想基于文本文件切片DataFrame 。 What is the wrong with following? 以下是什么问题？ It produces empty DataFrames. 它会生成空的DataFrame。

for line in lines:
    print line
    print df[df['id'].isin(line)]

But it works fine with following. 但它跟随下工作正常。

for line in lines:
    print df[df['id'].isin([1,2])]

Answer 1

line is a string. line是一个字符串。 [1,2] is a list. [1,2]是一个清单。 To convert the string to a list, you could use ast.literal_eval : 要将字符串转换为列表，可以使用ast.literal_eval ：

import ast
line = ast.literal_eval(line)

import ast
for line in lines:
    print line
    line = ast.literal_eval(line)
    print df.loc[df['id'].isin(line)]

PS. PS。 Although df[boolean_mask] works, I think df.loc[boolean_mask] is better because it does not require the reader to know the type of values in boolean_mask to understand which way the df is being sub-selected (by row or by column). 虽然df[boolean_mask]有效，但我认为df.loc[boolean_mask]更好，因为它不需要读者知道boolean_mask的值的类型，以了解df被选择的方式（按行或按列）。 df.loc is more explicit, and a tad faster. df.loc更明确，速度更快。

基于列表的csv切片pandas DataFrame

问题描述

1 个解决方案

解决方案1
3 已采纳 2014-04-07 10:17:30

基于列表的csv切片pandas DataFrame

问题描述

1 个解决方案

解决方案1 3 已采纳 2014-04-07 10:17:30

解决方案1
3 已采纳 2014-04-07 10:17:30