简体   繁体   中英

Pandas dataframe split or groupby dataframe at each occurence of value (True) in column

a have a df like this:

df = pd.DataFrame({'words':['hi', 'this', 'is', 'a', 'sentence', 'this', 'is', 'another', 'sentence'], 'indicator':[1,0,0,0,0,1,0,0,0]})

which gives me:

    words  indicator
0        hi          1
1      this          0
2        is          0
3         a          0
4  sentence          0
5      this          1
6        is          0
7   another          0
8  sentence          0

Now I want to merge all values of column 'words', that follow the '1' in indicator until the next '1' comes up. Something like this would be the ideal result:

                      words  indicator  counter
0     hi this is a sentence          1        5
1  this is another sentence          1        4

It's not that easy to explain, that's why I rely on this example. I tried groupby and split, but couldn't get to a solution. Last try would be to set up some kind of df.iterrows(), but I want to avoid this for now since the actual df is quite large.

Thanks in advance for any help!

You can get the cumulative sum of your indicator, then groupby that to join all the words together on a space and count the number of words in each sentence.

df["indicator"] = df["indicator"].cumsum()
df = df.groupby(
    "indicator", as_index=False
).agg(
    words=("words", " ".join), 
    counter=("indicator", "size")
)
#    indicator                     words  counter
# 0          1     hi this is a sentence        5
# 1          2  this is another sentence        4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM