Python：如果记录的排序方式与列表相同，我想根据列表返回 dataframe 的子集

Question

I have a dataframe that has more that a thousand records and I would like to return a sliced dataframe where the values are ordered similarly to the list.我有一个 dataframe 有超过一千条记录，我想返回一个切片的 dataframe ，其中值的排序类似于列表。

eg例如

lst = [0,1,0,0,0,1]

Input输入

    date season hot_or_cold
 0   2012-01-01 Winter 0
 1   2012-01-02 Winter 1
 2   2012-01-03 Winter 0
 3   2012-01-04 Winter 0
 4   2012-01-05 Winter 0
 5   2012-01-06 Winter 1
 6   2012-01-07 Winter 1
 7   2012-01-08 Winter 1
 8   2012-01-09 Winter 0
 9   2012-01-10 Winter 1
 10   2012-01-11 Winter 0
    # 1 - hot
    # 0 - cold

Output Output

    date season hot_or_cold
 0   2012-01-01 Winter 0
 1   2012-01-02 Winter 1
 2   2012-01-03 Winter 0
 3   2012-01-04 Winter 0
 4   2012-01-05 Winter 0
 5   2012-01-06 Winter 1

Thank you in advance先感谢您

Answer 1

Define 2 following functions:定义以下 2 个函数：

Find match between s (a Series , longer) and lst (a list, shorter).查找s （一个系列，更长）和lst （一个列表，更短）之间的匹配。
```
 def fndMatch(s, lst): len1 = s.size len2 = len(lst) for i1 in range(len1 - len2 + 1): i2 = i1 + len2 if s.iloc[i1:i2].eq(lst).all(): return (i1, i2) return (None, None)
```
When a match has been found, the result is both slice borders, otherwise a pair of None values.找到匹配项后，结果是两个切片边界，否则是一对None值。

Get a fragment of df with hot_or_cold column matching lst :获取df的片段，其中hot_or_cold列匹配lst ：

 def getFragment(): i1, i2 = fndMatch(df.hot_or_cold, lst) if i1 is None: return None else: return df.iloc[i1:i2]

When you call it ( getFragment() ) the result is:当您调用它（ getFragment() ）时，结果是：

         date  season  hot_or_cold
0  2012-01-01  Winter            0
1  2012-01-02  Winter            1
2  2012-01-03  Winter            0
3  2012-01-04  Winter            0
4  2012-01-05  Winter            0
5  2012-01-06  Winter            1

Answer 2

basic question is finding some pattern in dataframe and i got this here and have implemented same.基本问题是在 dataframe 中找到一些模式，我在这里得到了这个并实现了相同的。

import pandas as pd 
import numpy as np

arr = [0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1]
df = pd.DataFrame(data = arr, columns=['binary'])
pattern = [0,1, 0, 0, 0, 1]

matched = df.rolling(len(pattern)).apply(lambda x:all(np.equal(x, pattern)))
matched = matched.sum(axis = 1).astype(bool)   #Sum to perform boolean OR

idx_matched = np.where(matched)[0]
subset = [range(match-len(pattern)+1, match+1) for match in idx_matched]

result = pd.concat([df.iloc[subs,:] for subs in subset], axis = 0)

result

Answer 3

other way with accumulate function其他方式累积 function

from itertools import accumulate
import pandas as pd 
def accum(x):
    return list(accumulate(x))

lst = [0,1,0,0,0,1]
f = lambda x : accum([[i] for i in x])
b = df.groupby(['season'])['hot_or_cold'].apply(f)
df['col_accum2']  =  [(('Match ' if item[-len(lst):] == lst else 'NotMatch') if len(item) >= len(lst) else 'small list'  ) for subitem in b for item in subitem]

Python：如果记录的排序方式与列表相同，我想根据列表返回 dataframe 的子集

问题描述

Input输入

Output Output

3 个解决方案

解决方案1
0 2020-05-05 11:21:23

解决方案2
0 已采纳 2020-05-05 11:46:29

解决方案3
0 2020-05-06 14:10:22

Python：如果记录的排序方式与列表相同，我想根据列表返回 dataframe 的子集

问题描述

Input输入

Output Output

3 个解决方案

解决方案1 0 2020-05-05 11:21:23

解决方案2 0 已采纳 2020-05-05 11:46:29

解决方案3 0 2020-05-06 14:10:22

解决方案1
0 2020-05-05 11:21:23

解决方案2
0 已采纳 2020-05-05 11:46:29

解决方案3
0 2020-05-06 14:10:22