[英]Apply custom function to pandas groupby object
df1 = pd.DataFrame({'Chromosome': ['1A','1A','1A','1A','1A'],
'Marker': ['M1','M2','M3','M4','M5'],
'Position': [0,1.2,3.5,6,7.3]})
df2 = pd.DataFrame({'Chromosome': ['1A','1A','1A','1A','1A','1B','1B','1B'],
'Marker': ['M1','M2','M3','M4','M5','mk1','mk2','mk3'],
'Position': [0,1.2,3.5,6,7.3,0,2.3,3.2]})
#Expected result for df1
#'1A 5 M1 1.2 M2 2.3 M3 2.5 M4 1.3 M5'
#Expected result for df2
#'1A 5 M1 1.2 M2 2.3 M3 2.5 M4 1.3 M5'
#'1B 3 mk1 2.3 mk2 0.9 mk3'
#My function for computing intermarker distance
def position_interval(df):
df.loc[:,'diffPos'] = round(df['Position'].diff(),1).shift(-1)
a = []
i = 0
while i < df.shape[0]:#omit the last index
info = df['Marker'][i]+' '+str(round(df['diffPos'][i],1))
#print(info)
a.append(info)
i +=1
#print(a)
a.insert(0,str(len(df['Marker'])))
a.insert(0,df['Chromosome'][0])
new_info = ' '.join(a).replace(' nan','')#removing the last ' nan'
#print(new_info)
return new_info
Applying the function to df1 works perfectly:将该函数应用于 df1 效果很好:
position_interval(df1)
But I'm not sure how to apply to each grouby object:但我不确定如何应用于每个 grouby 对象:
position_interval(df2)
As the function need the 'Chromosome' key, you must place the as_index=False
argument in groupby :由于该函数需要 'Chromosome' 键,您必须将as_index=False
参数放在 groupby 中:
df2.groupby('Chromosome', as_index=False).apply(position_interval)
this will raise an exception because index 0 is not found for the "1B" group.这将引发异常,因为找不到“1B”组的索引 0。
Replacing the Series slicing by iloc
in the function will resolve this problem :在函数中用iloc
替换 Series 切片将解决这个问题:
def position_interval(df):
df.loc[:,'diffPos'] = round(df['Position'].diff(),1).shift(-1)
a = []
i = 0
while i < df.shape[0]:#omit the last index
info = df['Marker'].iloc[0]+' '+str(round(df['diffPos'].iloc[i],1))
#print(info)
a.append(info)
i +=1
#print(a)
a.insert(0,str(len(df['Marker'])))
a.insert(0,df['Chromosome'].iloc[0])
new_info = ' '.join(a).replace(' nan','')#removing the last ' nan'
#print(new_info)
return new_info
1A 5 M1 1.2 M1 2.3 M1 2.5 M1 1.3 M1
1B 3 mk1 2.3 mk1 0.9 mk1
It's possible to iterate over groupby object :可以遍历 groupby 对象:
for i, sub_df in f2.groupby('Chromosome',as_index=False):
print(position_interval(sub_df))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.