用于 Excel 电子表格的 Pandas groupby

Question

我有一个如下所示的电子表格（大约 1800 行），它是从从 Access 数据库中提取信息的 python 脚本生成的：

ID  Chemical            Association  Term 
1   1,1-Dichloroethene  exactMatch   1,1-Dichloroethylene
1   1,1-Dichloroethene  exactMatch   Vinylidene Chloride
2   1,2 Epoxyethane     exactMatch   Ethylene oxide  
2   1,2 Epoxyethane     exactMatch   Ethylene oxide (1,2 Epoxyethane)

我想可能使用熊猫来更改此电子表格的布局。 我想创建一个这样的表：

ID  Chemical            Association  Term                   (new column)
1   1,1-Dichloroethene  exactMatch   1,1-Dichloroethylene   Vinylidene Chloride   
2   1,2 Epoxyethane     exactMatch   Ethylene oxide (1...   Ethylene oxide

到目前为止，我已经使用 Pandas 编写了以下内容，但不确定下一步该怎么做：

data = pd.read_excel('Chemicals_exactMatch.xlsx', sheet_name='Sheet1')
df = pd.DataFrame(data)
grp = df.groupby(['ID','Chemical','Association'])

我认为需要将以下陈述纳入其中，但我不确定如何：

df.apply(lambda grouped: grouped['Term'].str.cat(sep="|"))
df.str.split(pat="|")

Answer 1

尝试这个：

df.set_index(['ID',
              'Chemical',
              'Association',
              df.groupby(['ID','Chemical','Association']).cumcount()])['Term']\
  .unstack().reset_index()

输出：

   ID            Chemical Association                     0                                 1
0   1  1,1-Dichloroethene  exactMatch  1,1-Dichloroethylene               Vinylidene Chloride
1   2     1,2 Epoxyethane  exactMatch        Ethylene oxide  Ethylene oxide (1,2 Epoxyethane)

Answer 2

我设法编写了以下有效的内容：

data = pd.read_excel(spreadsheet, sheet_name='Sheet1')
df = (pd.DataFrame(data)
        .groupby(['ID','Chemical','Association'])
        .apply(lambda grouped: grouped['Term'].str.cat(sep="!"))
        .str.split(pat="!", expand=True)
        .sort_values('Chemical')
        .to_excel('Chemicals_exactMatch.xlsx'))

用于 Excel 电子表格的 Pandas groupby

问题描述

2 个解决方案

解决方案1
1 2019-03-20 16:37:35

解决方案2
1 2019-03-26 11:16:05

用于 Excel 电子表格的 Pandas groupby

问题描述

2 个解决方案

解决方案1 1 2019-03-20 16:37:35

解决方案2 1 2019-03-26 11:16:05

解决方案1
1 2019-03-20 16:37:35

解决方案2
1 2019-03-26 11:16:05