[英]Slice pandas DataFrame based on csv of lists
I have a text file in the following format. 我有一个以下格式的文本文件。
[1,2]
[3]
[4,5,6,7,10]
And I have a pandas DataFrame
like following. 我有一个像下面这样的pandas DataFrame
。
df = pd.DataFrame({'id' : [1,2,3,4,5,6,7],
'path' : ["p1,p2,p3,p4","p1,p2,p1","p1,p5,p5,p7","p1,p2,p3,p3","p1,p2","p1","p2,p3,p4"]})
output: 输出:
id path
0 1 p1,p2,p3,p4
1 2 p1,p2,p1
2 3 p1,p5,p5,p7
3 4 p1,p2,p3,p3
4 5 p1,p2
5 6 p1
6 7 p2,p3,p4
I want to slice the DataFrame
based on the text file. 我想基于文本文件切片DataFrame
。 What is the wrong with following? 以下是什么问题? It produces empty DataFrames. 它会生成空的DataFrame。
for line in lines:
print line
print df[df['id'].isin(line)]
But it works fine with following. 但它跟随下工作正常。
for line in lines:
print df[df['id'].isin([1,2])]
line
is a string. line
是一个字符串。 [1,2]
is a list. [1,2]
是一个清单。 To convert the string to a list, you could use ast.literal_eval : 要将字符串转换为列表,可以使用ast.literal_eval :
import ast
line = ast.literal_eval(line)
import ast
for line in lines:
print line
line = ast.literal_eval(line)
print df.loc[df['id'].isin(line)]
PS. PS。 Although df[boolean_mask]
works, I think df.loc[boolean_mask]
is better because it does not require the reader to know the type of values in boolean_mask
to understand which way the df
is being sub-selected (by row or by column). 虽然df[boolean_mask]
有效,但我认为df.loc[boolean_mask]
更好,因为它不需要读者知道boolean_mask
的值的类型,以了解df
被选择的方式(按行或按列) 。 df.loc
is more explicit, and a tad faster. df.loc
更明确,速度更快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.