[英]How to extract the last 3 indices numbers before a specific category
UPDATE 更新
I have the following dataset, and I wish to get a list that includes the last three indices before 'YES' label. 我有以下数据集,我希望得到一个包含“ YES”标签前的最后三个索引的列表。 My dataset: 我的数据集:
i category
0 NO
1 NO
2 NO
3 NO
4 NO
5 YES
6 YES
7 YES
8 NO
9 NO
10 NO
11 YES
12 YES
I expect the outcome to be: 我希望结果是:
list=[2,3,4,8,9,10] list = [2,3,4,8,9,10]
Please note that YES usually occur in consecutive range of samples (2-6 samples). 请注意,在连续的样本范围(2-6个样本)中通常会出现“是”。 I wish to get the the last three indices before the first YES in the range. 我希望得到该范围内第一个“是”之前的最后三个索引。
PS: The dataset was stored in a csv file and I imported by using pandas PS:数据集存储在一个csv文件中,我通过使用熊猫导入
Probably not the most pythonic way, but I couldn't think of a way to do this without aa for loop and some slicing, feels like a hacky method: 可能不是最pythonic的方式,但是我想不出没有aa for循环和一些切片的方法,感觉就像是一种hacky方法:
a = df[((df.category.ne(df.category.shift()))==True) & (df.category == 'YES')].index
indices = []
for x in a:
indices.append(df.iloc[slice(max(0, x-3), min(x, len(df)))])
new_df = pd.concat(indices) # if you wanted this as a df.
list(new_df.index)
[2, 3, 4, 8, 9, 10]
Let's assume, as you stated on your comment, that there are always at least 3 items before every YES. 如您在评论中所述,让我们假设在每次“是”之前始终至少有3个项目。 A possible solution will be 一个可能的解决方案是
import pandas as pd
flatten = lambda l: [item for sublist in l for item in sublist]
df = pd.DataFrame({"category":['NO', 'NO', 'NO', 'NO', 'NO',
'YES', 'NO', 'NO', 'NO', 'NO',
'NO','YES','NO']})
# take only indices where YES occurs
idx = df[df["category"]=="YES"].index
# for every i in idx take the previuos 3 indices
lst = [list(range(i-3, i)) for i in idx]
# flatten lst
lst = flatten(lst)
Here's some code that's easy to read and does what you want. 这是一些易于阅读并且可以完成您想要的代码。 it iterates over the indices of the list and pulls out what you need. 它遍历列表的索引并提取您所需的内容。
the second for loops is to simply flatten the double list from the result list. 第二个for循环是简单地从结果列表中展平双精度列表。
li= ['1','2','3','4','YES','6','7','8','9','0','YES']
result = []
for x in range(len(li)):
if li[x] is 'YES':
result.append(li[x-3:x])
final= []
for x in result:
for y in x:
final.append(y)
final = ['2', '3', '4', '8', '9', '0'] 最终= ['2','3','4','8','9','0']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.