简体   繁体   English

如何合并/组合熊猫中的列?

[英]How to merge/combine columns in pandas?

I have a (example-) dataframe with 4 columns:我有一个(示例-)数据框,有 4 列:

data = {'A': ['a', 'b', 'c', 'd', 'e', 'f'],
    'B': [42, 52, np.nan, np.nan, np.nan, np.nan],  
    'C': [np.nan, np.nan, 31, 2, np.nan, np.nan],
    'D': [np.nan, np.nan, np.nan, np.nan, 62, 70]}
df = pd.DataFrame(data, columns = ['A', 'B', 'C', 'D'])

    A   B       C       D
0   a   42.0    NaN     NaN
1   b   52.0    NaN     NaN
2   c   NaN     31.0    NaN
3   d   NaN     2.0     NaN
4   e   NaN     NaN     62.0
5   f   NaN     NaN     70.0

I would now like to merge/combine columns B, C, and D to a new column E like in this example:我现在想将 B、C 和 D 列合并/组合到一个新的 E 列,如本例所示:

data2 = {'A': ['a', 'b', 'c', 'd', 'e', 'f'],
    'E': [42, 52, 31, 2, 62, 70]}
df2 = pd.DataFrame(data2, columns = ['A', 'E'])

    A   E
0   a   42
1   b   52
2   c   31
3   d   2
4   e   62
5   f   70

I found a quite similar question here but this adds the merged colums B, C, and D at the end of column A:我在这里发现了一个非常相似的问题但这在 A 列的末尾添加了合并的列 B、C 和 D:

0      a
1      b
2      c
3      d
4      e
5      f
6     42
7     52
8     31
9      2
10    62
11    70
dtype: object

Thanks for help.感谢您的帮助。

Option 1选项 1
Using assign and drop使用assigndrop

In [644]: cols = ['B', 'C', 'D']

In [645]: df.assign(E=df[cols].sum(1)).drop(cols, 1)
Out[645]:
   A     E
0  a  42.0
1  b  52.0
2  c  31.0
3  d   2.0
4  e  62.0
5  f  70.0

Option 2选项 2
Using assignment and drop使用赋值和drop

In [648]: df['E'] = df[cols].sum(1)

In [649]: df = df.drop(cols, 1)

In [650]: df
Out[650]:
   A     E
0  a  42.0
1  b  52.0
2  c  31.0
3  d   2.0
4  e  62.0
5  f  70.0

Option 3 Lately, I like the 3rd option.选项 3最近,我喜欢第三个选项。
Using groupby使用groupby

In [660]: df.groupby(np.where(df.columns == 'A', 'A', 'E'), axis=1).first() #or sum max min
Out[660]:
   A     E
0  a  42.0
1  b  52.0
2  c  31.0
3  d   2.0
4  e  62.0
5  f  70.0

In [661]: df.columns == 'A'
Out[661]: array([ True, False, False, False], dtype=bool)

In [662]: np.where(df.columns == 'A', 'A', 'E')
Out[662]:
array(['A', 'E', 'E', 'E'],
      dtype='|S1')

The question as written asks for merge/combine as opposed to sum, so posting this to help folks who find this answer looking for help on coalescing with combine_first, which can be a bit tricky.所写的问题要求合并/合并,而不是求和,因此发布此问题是为了帮助找到此答案的人寻求与 combine_first 合并的帮助,这可能有点棘手。

df2 = pd.concat([df["A"], 
             df["B"].combine_first(df["C"]).combine_first(df["D"])], 
            axis=1)
df2.rename(columns={"B":"E"}, inplace=True)
   A     E
0  a  42.0
1  b  52.0
2  c  31.0
3  d  2.0 
4  e  62.0
5  f  70.0

What's so tricky about that?这有什么好纠结的? in this case there's no problem - but let's say you were pulling the B, C and D values from different dataframes, in which the a,b,c,d,e,f labels were present, but not necessarily in the same order.在这种情况下没有问题 - 但假设您从不同的数据帧中提取 B、C 和 D 值,其中存在 a、b、c、d、e、f 标签,但顺序不一定相同。 combine_first() aligns on the index, so you'd need to tack a set_index() on to each of your df references. combine_first() 在索引上对齐,因此您需要在每个 df 引用上添加 set_index() 。

df2 = pd.concat([df.set_index("A", drop=False)["A"], 
             df.set_index("A")["B"]\
             .combine_first(df.set_index("A")["C"])\
             .combine_first(df.set_index("A")["D"]).astype(int)], 
            axis=1).reset_index(drop=True)
df2.rename(columns={"B":"E"}, inplace=True)

   A   E
0  a  42
1  b  52
2  c  31
3  d  2 
4  e  62
5  f  70

Use difference for columns names without A and then get sum or max :对没有A列名使用difference ,然后得到summax

cols = df.columns.difference(['A'])
df['E'] = df[cols].sum(axis=1).astype(int)
# df['E'] = df[cols].max(axis=1).astype(int)
df = df.drop(cols, axis=1)
print (df)
   A   E
0  a  42
1  b  52
2  c  31
3  d   2
4  e  62
5  f  70

If multiple values per rows:如果每行有多个值:

data = {'A': ['a', 'b', 'c', 'd', 'e', 'f'],
    'B': [42, 52, np.nan, np.nan, np.nan, np.nan],  
    'C': [np.nan, np.nan, 31, 2, np.nan, np.nan],
    'D': [10, np.nan, np.nan, np.nan, 62, 70]}
df = pd.DataFrame(data, columns = ['A', 'B', 'C', 'D'])

print (df)
   A     B     C     D
0  a  42.0   NaN  10.0
1  b  52.0   NaN   NaN
2  c   NaN  31.0   NaN
3  d   NaN   2.0   NaN
4  e   NaN   NaN  62.0
5  f   NaN   NaN  70.0

cols = df.columns.difference(['A'])
df['E'] = df[cols].apply(lambda x: ', '.join(x.dropna().astype(int).astype(str)), 1)
df = df.drop(cols, axis=1)
print (df)
   A       E
0  a  42, 10
1  b      52
2  c      31
3  d       2
4  e      62
5  f      70

You can also use ffill with iloc :您还可以将ffilliloc ffill使用:

df['E'] = df.iloc[:, 1:].ffill(1).iloc[:, -1].astype(int)
df = df.iloc[:, [0, -1]]

print(df)

   A   E
0  a  42
1  b  52
2  c  31
3  d   2
4  e  62
5  f  70

Zero's third option using groupby requires a numpy import and only handles one column outside the set of columns to collapse, while jpp's answer using ffill requires you know how columns are ordered. Zero 使用groupby的第三个选项需要一个 numpy 导入,并且只处理要折叠的列组之外的一列,而 jpp 使用ffill的答案要求您知道列是如何排序的。 Here's a solution that has no extra dependencies, takes an arbitrary input dataframe, and only collapses columns if all rows in those columns are single-valued:这是一个没有额外依赖项的解决方案,采用任意输入数据框,并且仅在这些列中的所有行都是单值时才折叠列:

import pandas as pd

data = [{'A':'a', 'B':42, 'messy':'z'},
    {'A':'b', 'B':52, 'messy':'y'},
    {'A':'c', 'C':31},
    {'A':'d', 'C':2, 'messy':'w'},
    {'A':'e', 'D':62, 'messy':'v'},
    {'A':'f', 'D':70, 'messy':['z']}]
df = pd.DataFrame(data)

cols = ['B', 'C', 'D']
new_col = 'E'
if df[cols].apply(lambda x: len(x.notna().value_counts()) == 1, axis=1).all():
    df[new_col] = df[cols].ffill(axis=1).dropna(axis=1)

df2 = df.drop(columns=cols)

print(df, '\n\n', df2)

Output:输出:

   A     B messy     C     D
0  a  42.0     z   NaN   NaN
1  b  52.0     y   NaN   NaN
2  c   NaN   NaN  31.0   NaN
3  d   NaN     w   2.0   NaN
4  e   NaN     v   NaN  62.0
5  f   NaN   [z]   NaN  70.0

   A messy     E
0  a     z  42.0
1  b     y  52.0
2  c   NaN  31.0
3  d     w   2.0
4  e     v  62.0
5  f   [z]  70.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM