I want to code a script that takes series values from a column, splits them into strings and makes a new column for each of the resulting strings (filled with NaN right now). As the df is groupedby
Column1, I want to do this for every group
My input data frame looks like this:
df1:
Column1 Column2
0 L17 a,b,c,d,e
1 L7 a,b,c
2 L6 a,b,f
3 L6 h,d,e
What I finally want to have is:
Column1 Column2 a b c d e f h
0 L17 a,b,c,d,e nan nan nan nan nan nan nan
1 L7 a,b,c nan nan nan nan nan nan nan
2 L6 a,b,f nan nan nan nan nan nan nan
My code currently looks like this:
def NewCols(x):
for item, frame in group['Column2'].iteritems():
Genes = frame.split(',')
for value in Genes:
string = value
x[string] = np.nan
return x
df1.groupby('Column1').apply(NewCols)
My thought behind this was that the code loops through Column2 of every grouped object, splitting the values contained in frame
at comma and creating a list for that group. So far the code works fine. Then I added
for value in Genes:
string = value
x[string] = np.nan
return x
with the intention of adding a new column for every value contained in the list Genes
. However, my output looks like this:
Column1 Column2 d
0 L17 a,b,c,d,e nan
1 L7 a,b,c nan
2 L6 a,b,f nan
3 L6 h,d,e nan
and I am pretty much struck dumb. Can someone explain why only one column gets appended (which is not even named after the first value in the first list of the first group) and suggest how I could improve my code?
I think you just return
too early in your function, before the end of the two loops. If you indent it back two times like this :
def NewCols(x):
for item, frame in group['Column2'].iteritems():
Genes = frame.split(',')
for value in Genes:
string = value
x[string] = np.nan
return x
UngroupedResGenesLineage.groupby('Column1').apply(NewCols)
It should work fine !
cols = sorted(list(set(df1['Column2'].apply(lambda x: x.split(',')).sum())))
df = df1.groupby('Column1').agg(lambda x: ','.join(x)).reset_index()
pd.concat([df,pd.DataFrame({c:np.nan for c in cols}, index=df.index)], axis=1)
Column1 Column2 a b c d e f h
0 L17 a,b,c,d,e NaN NaN NaN NaN NaN NaN NaN
1 L6 a,b,f,h,d,e NaN NaN NaN NaN NaN NaN NaN
2 L7 a,b,c NaN NaN NaN NaN NaN NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.