[英]How can I filter list by dataframe in python?
如何在python中按dataframe过滤列表?
例如,我有列表L = ['a', 'b', 'c']
和数据帧df
:
Name Value
a 0
a 1
b 2
d 3
结果应该是['a', 'b']
。
a = df.loc[df['Name'].isin(L), 'Name'].unique().tolist()
print (a)
['a', 'b']
要么:
a = np.intersect1d(L, df['Name']).tolist()
print (a)
['a', 'b']
时间 :
df = pd.concat([df]*1000).reset_index(drop=True)
L = ['a', 'b', 'c']
#jezrael 1
In [163]: %timeit (df.loc[df['Name'].isin(L), 'Name'].unique().tolist())
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 774 µs per loop
#jezrael 2
In [164]: %timeit (np.intersect1d(L, df['Name']).tolist())
1000 loops, best of 3: 1.81 ms per loop
#divakar
In [165]: %timeit ([i for i in L if i in df.Name.tolist()])
1000 loops, best of 3: 393 µs per loop
#john galt 1
In [166]: %timeit (df.query('Name in @L').Name.unique().tolist())
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.36 ms per loop
#john galt 2
In [167]: %timeit ([x for x in df.Name.unique() if x in L])
The slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 182 µs per loop
这是一个 -
[i for i in l if i in df.Name.tolist()]
样品运行 -
In [303]: df
Out[303]:
Name Value
0 a 0
1 a 1
2 b 2
3 d 3
In [304]: l = ['a', 'b', 'c']
In [305]: [i for i in l if i in df.Name.tolist()]
Out[305]: ['a', 'b']
使用query
另一种方式
In [1470]: df.query('Name in @L').Name.unique().tolist()
Out[1470]: ['a', 'b']
要么,
In [1472]: [x for x in df.Name.unique() if x in L]
Out[1472]: ['a', 'b']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.