[英]How to groupby certain column in a dataframe in pandas?
I have the following dataframe where I have different genes, drugs ID's and citations.我有以下 dataframe 我有不同的基因、药物 ID 和引用。 I essentially need the same gene to be merged with the same drug but include both citations for that drug if it is to occur.
我基本上需要相同的基因与相同的药物合并,但如果要发生该药物,则包括该药物的两个引用。 For example below: pharmacogenomic
例如以下:药物基因组学
Gene Drug ID Cite
1 MAD1L1 Lithium[17] 34718328 [17]
2 OAS1 Lithium[17] 34718328 [17]
3 OAS1 Lithium[7] 27401222 [7]
MAD1L1 has lithium and citation 17, but OAS1 has lithium and citation 17 and 7. I would like to concat the table into something similar to below: MAD1L1 有锂和引文 17,但 OAS1 有锂和引文 17 和 7。我想将表格连接成类似于下面的内容:
Gene Drug ID Cite
1 MAD1L1 Lithium[17] 34718328 [17]
2 OAS1 Lithium[17][7] 34718328 [17]
OAS1 has lithium,but both citation are next to eachother, and MAD1L1 is unchanged as it does not share the same citation for lithium as OAS1. OAS1 有锂,但两个引用彼此相邻,而 MAD1L1 没有改变,因为它与 OAS1 不共享相同的锂引用。
here is one way to do it这是一种方法
#use cite to group together the citations
df['cite2']=df.groupby('Gene')['Cite'].transform('sum')
#group by gene, and take the first result for each gene
df2=df.groupby('Gene').first()
#split the citation from the Drug name and append the cite2 (created above)
df2['Drug']=df2['Drug'].str.split('[', expand=True)[0] + df2['cite2']
# drop the temporary cite2 columns
df2.drop(columns='cite2', inplace=True)
df2.reset_index()
Gene Drug ID Cite
0 MAD1L1 Lithium[17] 34718328 [17]
1 OAS1 Lithium[17][7] 34718328 [17]
Remove the citation from "Drug", then groupby.agg
, either as 'first' or to join
the strings.从 "Drug" 中删除引用,然后从
groupby.agg
中删除,或者作为 'first' 或者join
字符串。 Then add back the citations:然后添加引用:
out = (df
.assign(Drug=df['Drug'].str.extract(r'(^[^\[\]]+)', expand=False))
.groupby(['Gene', 'Drug'], as_index=False)
.agg({'ID': 'first', 'Cite': ''.join})
.assign(Drug=lambda d: d['Drug']+d['Cite'])
)
Output: Output:
Gene Drug ID Cite
0 MAD1L1 Lithium[17] 34718328 [17]
1 OAS1 Lithium[17][7] 34718328 [17][7]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.