繁体   English   中英

如何在 pandas dataframe 的列中组合具有不同值的行

[英]How to combine rows with different values in columns in pandas dataframe

我正在执行以下操作:

import pandas as pd

something = [[1, "p", 2], [3, "t", 5], [6, "u", 10], [1, "p", 2], [4, "l", 9], [1, "t", 2], [3, "t", 5], [6, "c", 10], [1, "p", 2], [4, "l", 9]]
test = pd.DataFrame(something)
print(test)
test = test.drop_duplicates()
test.columns = ['id', 'state', 'level']
test = test.sort_values(by=['id'], ascending=True)
test_unique = test["id"].unique()
print(test[test["id"] == 1])

The output is the following: 
   0  1   2
0  1  p   2
1  3  t   5
2  6  u  10
3  1  p   2
4  4  l   9
5  1  t   2
6  3  t   5
7  6  c  10
8  1  p   2
9  4  l   9

#this after dropping duplicates 
   id state  level
0   1     p      2
5   1     t      2

我想要做的是将这两行与相同的 id 组合起来,并生成一个 output 作为1 pt 2 在这里,列名将是相同的 id、state 和 level。 如何实现?

你可以使用groupby.agg

print(df)

    id  state  level
0   1   p      2
5   1   t      2

df.groupby("id", as_index=False).agg(
                      {'state': '-'.join, "id": "first", "level": "first"})

    state   id  level
0   p-t     1   2

您可以分组然后聚合

import pandas as pd

something = [[1, "p", 2], [3, "t", 5], [6, "u", 10], [1, "p", 2], [4, "l", 9], [1, "t", 2], [3, "t", 5], [6, "c", 10], [1, "p", 2], [4, "l", 9]]
test = pd.DataFrame(something)
print(test)
test = test.drop_duplicates()
test.columns = ['id', 'state', 'level']
test = test.sort_values(by=['id'], ascending=True)
test_unique = test["id"].unique()


df_aslist = test.groupby(['id', 'level']).aggregate(lambda x: list(x)).reset_index()

df_aslist['state'] = df_aslist['state'].apply(lambda x: '-'.join(x))
print(df_aslist)

返回

   id  level state
0   1      2   p-t
1   3      5     t
2   4      9     l
3   6     10   u-c

或仅用于指定值

print(df_aslist[df_aslist['id'] == 1])

印刷

   id  level state
0   1      2   p-t

Pandas 初学者在这里..我会转置,将值合并为列,然后转回:

def merge_duplicates(x):    
    a,b = x

    if a==b:
        return a
    else:
        a,b = str(a), str(b)
        return '-'.join((a,b))

df = pd.DataFrame({"id":[0,5], "state":[1, 1], "level":[2,2]})

df = df.T

df["combined"] = [merge_duplicates(row) for row in df[[0,1]].values]

df = df.T

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM