I have this dataframe
import pandas as pd
df = pd.DataFrame({"a":[None, None, "hello1","hello2", None,"hello4","hello5","hello6", None, "hello8", None,"hello10",None ] , "b": ["we", "are the world", "we", "love", "the", "world", "so", "much", "and", "dance", "every", "day", "yeah"]})
a b
0 None we
1 None are the world
2 hello1 we
3 hello2 love
4 None the
5 hello4 world
6 hello5 so
7 hello6 much
8 None and
9 hello8 dance
10 None every
11 hello10 day
12 None yeah
The desired output is:
a b new_text
0 Intro we we are the world
2 hello1 we we
3 hello2 love love the
5 hello4 world world
6 hello5 so so
7 hello6 much much and
9 hello8 dance dance every
11 hello10 day day yeah
I have a function that does it, but it uses while in pandas which is probably not the best solution.
def connect_rows_on_condition(df, new_col_name, text, condition):
if df[condition][0] == None:
df[condition][0] = "Intro"
df[new_col_name] = ""
index = 1
last_non_none = 0
while index < len(df):
if df[condition][index] != None:
last_non_none = index
df[new_col_name][last_non_none] = df[text][index]
elif df[condition][index] == None :
df[new_col_name][last_non_none] = df[text][last_non_none] + " " + df[text][index]
index += 1
output_df = df[df[condition].isna() == False]
return output_df
The main logic is, if in column "a" is a None put the text in b into the row before. Is there a solution that is not based on loops?
First, create a Series which describes the groups:
grouping = df.a.notnull().cumsum()
Then, for column a we can use the first element and for column b we want to concatenate all elements:
df.groupby(grouping).agg({'a': 'first', 'b': ' '.join})
This gives:
a b
a
0 None we are the world
1 hello1 we
2 hello2 love the
3 hello4 world
4 hello5 so
5 hello6 much and
6 hello8 dance every
7 hello10 day yeah
You can replace None
with "Intro"
yourself as a special case if needed, since that text doesn't occur in the input.
You can also do the grouping by a word, in case you don't have a null value to group on by this. As to not duplicate John's solution, i'll leave it here for those that may be interested in how to do this with a non null situation:
In [356]: Truth = pd.to_numeric(df.a.str.contains('None') == False).cumsum()
...:
In [357]: df.groupby(Truth)['b'].agg(list)
Out[357]:
a
0 [we, are the world]
1 [we]
2 [love, the]
3 [world]
4 [so]
5 [much, and]
6 [dance, every]
7 [day, yeah]
Name: b, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.