a have a df like this:
df = pd.DataFrame({'words':['hi', 'this', 'is', 'a', 'sentence', 'this', 'is', 'another', 'sentence'], 'indicator':[1,0,0,0,0,1,0,0,0]})
which gives me:
words indicator
0 hi 1
1 this 0
2 is 0
3 a 0
4 sentence 0
5 this 1
6 is 0
7 another 0
8 sentence 0
Now I want to merge all values of column 'words', that follow the '1' in indicator until the next '1' comes up. Something like this would be the ideal result:
words indicator counter
0 hi this is a sentence 1 5
1 this is another sentence 1 4
It's not that easy to explain, that's why I rely on this example. I tried groupby and split, but couldn't get to a solution. Last try would be to set up some kind of df.iterrows(), but I want to avoid this for now since the actual df is quite large.
Thanks in advance for any help!
You can get the cumulative sum of your indicator, then groupby that to join all the words together on a space and count the number of words in each sentence.
df["indicator"] = df["indicator"].cumsum()
df = df.groupby(
"indicator", as_index=False
).agg(
words=("words", " ".join),
counter=("indicator", "size")
)
# indicator words counter
# 0 1 hi this is a sentence 5
# 1 2 this is another sentence 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.