[英]How to append new columns to a pandas groupby object from a list of values
I want to code a script that takes series values from a column, splits them into strings and makes a new column for each of the resulting strings (filled with NaN right now). 我想编写一个脚本,该脚本从一列中获取系列值,将其拆分为字符串,并为每个结果字符串创建一个新列(现在已用NaN填充)。 As the df is
groupedby
Column1, I want to do this for every group 由于df
groupedby
Column1 groupedby
,因此我想对每个组进行此操作
My input data frame looks like this: 我的输入数据框如下所示:
df1:
Column1 Column2
0 L17 a,b,c,d,e
1 L7 a,b,c
2 L6 a,b,f
3 L6 h,d,e
What I finally want to have is: 我最后想要拥有的是:
Column1 Column2 a b c d e f h
0 L17 a,b,c,d,e nan nan nan nan nan nan nan
1 L7 a,b,c nan nan nan nan nan nan nan
2 L6 a,b,f nan nan nan nan nan nan nan
My code currently looks like this: 我的代码当前如下所示:
def NewCols(x):
for item, frame in group['Column2'].iteritems():
Genes = frame.split(',')
for value in Genes:
string = value
x[string] = np.nan
return x
df1.groupby('Column1').apply(NewCols)
My thought behind this was that the code loops through Column2 of every grouped object, splitting the values contained in frame
at comma and creating a list for that group. 我的想法是,代码循环遍历每个分组对象的Column2,以逗号分隔
frame
中包含的值并为该组创建一个列表。 So far the code works fine. 到目前为止,代码工作正常。 Then I added
然后我加了
for value in Genes:
string = value
x[string] = np.nan
return x
with the intention of adding a new column for every value contained in the list Genes
. 目的是为
Genes
列表中包含的每个值添加一个新列。 However, my output looks like this: 但是,我的输出看起来像这样:
Column1 Column2 d
0 L17 a,b,c,d,e nan
1 L7 a,b,c nan
2 L6 a,b,f nan
3 L6 h,d,e nan
and I am pretty much struck dumb. 而且我几乎很傻。 Can someone explain why only one column gets appended (which is not even named after the first value in the first list of the first group) and suggest how I could improve my code?
有人可以解释为什么只附加一个列(甚至没有以第一组的第一个列表中的第一个值命名)并建议我如何改进代码吗?
I think you just return
too early in your function, before the end of the two loops. 我认为您只是在两个循环结束之前
return
函数太早。 If you indent it back two times like this : 如果您将其缩进两次,如下所示:
def NewCols(x):
for item, frame in group['Column2'].iteritems():
Genes = frame.split(',')
for value in Genes:
string = value
x[string] = np.nan
return x
UngroupedResGenesLineage.groupby('Column1').apply(NewCols)
It should work fine ! 它应该工作正常!
cols = sorted(list(set(df1['Column2'].apply(lambda x: x.split(',')).sum())))
df = df1.groupby('Column1').agg(lambda x: ','.join(x)).reset_index()
pd.concat([df,pd.DataFrame({c:np.nan for c in cols}, index=df.index)], axis=1)
Column1 Column2 a b c d e f h
0 L17 a,b,c,d,e NaN NaN NaN NaN NaN NaN NaN
1 L6 a,b,f,h,d,e NaN NaN NaN NaN NaN NaN NaN
2 L7 a,b,c NaN NaN NaN NaN NaN NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.