简体   繁体   中英

Pandas dataframe going up another column based on value in second column and counting

I have the following dataframe

在此处输入图像描述

We want to count the no of times reason was ='end' after they encountered a keyword in another column. for eg the list of keywords are hire and career. So for case id '1' the 'end' was in the reason column after hire in text column. So hire gets a count of 1. In the second case, case id '2' the end was encoutered after both 'hire' and 'career', but career was the last one so the end was due to career and not hire. We need to take last keyword in the 'text' column as the probable reason for end.Hence 'career' nees a count of 1.We need to do this for each 'id'

Sample out put below

在此处输入图像描述

If you forward fill the text column you can then dropna on the entire dataframe and the result will be the reason/text combos you want. You take the value counts of the remaining text columns and turn that into the df you want.

import pandas as pd
df = pd.DataFrame({'id':[1,1,1,1,1,2,2,2,2,2],
                  'reason':[np.nan,np.nan,np.nan,'end',np.nan,np.nan,np.nan,np.nan,'end',np.nan],
                  'text':[np.nan,'hire',np.nan,np.nan,'career',np.nan,'hire','career',np.nan,np.nan]})

output = (
  df.assign(text=df['text'].ffill())
    .dropna()
    .text
    .value_counts()
    .to_frame('counts')
    .T
)

print(output)

Output

        hire  career
counts     1       1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM