繁体   English   中英

如何在python中按dataframe过滤列表?

[英]How can I filter list by dataframe in python?

如何在python中按dataframe过滤列表?

例如,我有列表L = ['a', 'b', 'c']和数据帧df

Name Value
   a     0
   a     1
   b     2
   d     3

结果应该是['a', 'b']

a = df.loc[df['Name'].isin(L), 'Name'].unique().tolist()
print (a)
['a', 'b']

要么:

a = np.intersect1d(L, df['Name']).tolist()
print (a)
['a', 'b']

时间

df = pd.concat([df]*1000).reset_index(drop=True)

L = ['a', 'b', 'c']

#jezrael 1
In [163]: %timeit (df.loc[df['Name'].isin(L), 'Name'].unique().tolist())
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 774 µs per loop

#jezrael 2    
In [164]: %timeit (np.intersect1d(L, df['Name']).tolist())
1000 loops, best of 3: 1.81 ms per loop

#divakar
In [165]: %timeit ([i for i in L if i in df.Name.tolist()])
1000 loops, best of 3: 393 µs per loop

#john galt 1
In [166]: %timeit (df.query('Name in @L').Name.unique().tolist())
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.36 ms per loop

#john galt 2    
In [167]: %timeit ([x for x in df.Name.unique() if x in L])
The slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 182 µs per loop

这是一个 -

[i for i in l if i in df.Name.tolist()]

样品运行 -

In [303]: df
Out[303]: 
  Name  Value
0    a      0
1    a      1
2    b      2
3    d      3

In [304]: l = ['a', 'b', 'c']

In [305]: [i for i in l if i in df.Name.tolist()]
Out[305]: ['a', 'b']

使用query另一种方式

In [1470]: df.query('Name in @L').Name.unique().tolist()
Out[1470]: ['a', 'b']

要么,

In [1472]: [x for x in df.Name.unique() if x in L]
Out[1472]: ['a', 'b']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM