简体   繁体   English

在新的pandas列中创建基于两列的索引时出现问题?

[英]Problems while creating a two column based index in a new pandas column?

Given the following dataframe: 给定以下数据框:

col_1   col_2
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   1
False   2
True    2
False   2
False   2
True    2
False   2
False   2
False   2
False   2
False   2
False   2
False   2
False   2
False   2
False   2
False   2

How can I create a new index that help to identify when a True value is present in col_1 ? 如何创建新索引以帮助识别col_1何时存在True值? That is, when in the first column a True value appears I would like to fill backward with a number starting from one the new column. 也就是说,当在第一列中出现True值时,我想向后填充一个从新列开始的数字。 For example, this is the expected output for the above dataframe: 例如,这是上述数据框的预期输出:

   col_1  col_2 new_id
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   1   1
    False   2   1
    True    2   1   --------- ^ (fill with 1 and increase the counter)
    False   2   2
    False   2   2
    True    2   2   --------- ^ (fill with 2 and increase the counter)
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    False   2   3
    True    2   4   --------- ^ (fill with 3 and increase the counter)

The problem is that I do not know how to create the id although I know that pandas provide a bfill object that may help to achieve this purpose. 问题是尽管我知道熊猫提供了一个可能有助于实现此目的的填充对象,但我不知道如何创建id。 So far I tried to iterate with a simple for loop: 到目前为止,我尝试使用一个简单的for循环进行迭代:

count = 0
for index, row in df.iterrows():
    if row['col_1'] == False:
        print(count+1)
    else:
        print(row['col_2'] + 1)

However, I do not know how to increase the counter to the next number. 但是,我不知道如何将计数器增加到下一个数字。 Also I tried to create a function and then apply it to the dataframe: 我也尝试创建一个函数,然后将其应用于数据框:

def create_id(col_1, col_2):
    counter = 0
    if col_1 == True and col_2.bool() == True:
        return counter + 1
    else:
        pass

Nevertheless, i lose control of filling backward the column. 但是,我无法控制向后填充列。

Just do with cumsum 就用cumsum

df['new_id']=(df.col_1.cumsum().shift().fillna(0)+1).astype(int)
df
Out[210]: 
    col_1  col_2  new_id
0   False      1       1
1   False      1       1
2   False      1       1
3   False      1       1
4   False      1       1
5   False      1       1
6   False      1       1
7   False      1       1
8   False      1       1
9   False      1       1
10  False      1       1
11  False      1       1
12  False      1       1
13  False      1       1
14  False      2       1
15   True      2       1
16  False      2       2
17  False      2       2
18   True      2       2
19  False      2       3
20  False      2       3
21  False      2       3
22  False      2       3
23  False      2       3
24  False      2       3
25  False      2       3
26  False      2       3
27  False      2       3
28  False      2       3
29  False      2       3

If you aim to append the new_id column to your dataframe: 如果您打算将new_id列附加到数据框:

new_id=[]
counter=1
for index, row in df.iterrows():
    new_id+= [counter]
    if row['col_1']==True:
        counter+=1   
df['new_id']=new_id

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM