[英]How to find all those Y elements that precede the corresponding X elements in time? - Pandas, Python
I am using Pandas to try to find all those Y elements that precede the corresponding X elements in time.我正在使用 Pandas 尝试及时找到在相应 X 元素之前的所有 Y 元素。
df = {'time':[1,2,3,4,5,6,7,8], 'X':['x','w','r','a','k','y','u','xa'],'Y':['r','xa','a','x','w','u','k','y']}
df = pd.DataFrame.from_dict(df)
time X Y
0 1 x r
1 2 w xa
2 3 r a
3 4 a x
4 5 k w
5 6 y u
6 7 u k
7 8 xa y
What I would like to achieve is:我想要实现的是:
time X Y
0 1 x r
1 2 w xa
2 3 r a
5 6 y u
Any ideas?有任何想法吗?
You can make two dictionaries which keep track of the indexes.您可以制作两个跟踪索引的字典。 Then use
pd.Series.map
to get boolean index then use boolean indexing
然后使用
pd.Series.map
得到 boolean 索引然后使用boolean indexing
idx = dict(zip(df['X'],df['time']))
idx2 = dict(zip(df['Y'],df['time']))
mask = df['Y'].map(lambda k: idx[k]>idx2[k]
df[mask]
time X Y
0 1 x r
1 2 w xa
2 3 r a
5 6 y u
df.apply
over axis 1 is not recommended it should be as your last resort. df.apply
over axis 1 不推荐它应该作为你最后的手段。 Check out why看看为什么
Here's timeit analysis which supports the statement.这是支持该声明的 timeit 分析。
In [74]: %%timeit
...: df[df.apply(lambda row: row['Y'] in df.loc[row.time:,'X'].values, axis=1)]
...:
...:
2.26 ms ± 203 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [80]: %%timeit
...: idx = dict(zip(df['X'],df['time']))
...: idx2 = dict(zip(df['Y'],df['time']))
...: mask = df['Y'].map(lambda k: idx[k]>idx2[k])
...: x = df[mask]
...:
...:
498 µs ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Almost 5X faster.几乎快 5 倍。
Try this:试试这个:
result = df[df.apply(lambda row: row['Y'] in df.loc[row.time:,'X'].values, axis=1)]
print(result)
time X Y
0 1 x r
1 2 w xa
2 3 r a
5 6 y u
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.