如何将多个列值连接到 Pandas dataframe 中的单个列中

Question

This question is same to this posted earlier.这个问题与之前发布的这个问题相同。 I want to concatenate three columns instead of concatenating two columns:我想连接三列而不是连接两列：

Here is the combining two columns:这是合并的两列：

df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})

df['combined']=df.apply(lambda x:'%s_%s' % (x['foo'],x['bar']),axis=1)

df
    bar foo new combined
0   1   a   apple   a_1
1   2   b   banana  b_2
2   3   c   pear    c_3

I want to combine three columns with this command but it is not working, any idea?我想将三列与此命令组合，但它不起作用，知道吗？

df['combined']=df.apply(lambda x:'%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)

Answer 1

Another solution using DataFrame.apply() , with slightly less typing and more scalable when you want to join more columns:另一种使用DataFrame.apply()的解决方案，当您想要加入更多列时，键入稍微少一些并且可扩展性更高：

cols = ['foo', 'bar', 'new']
df['combined'] = df[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)

Answer 2

you can simply do:你可以简单地做：

In[17]:df['combined']=df['bar'].astype(str)+'_'+df['foo']+'_'+df['new']

In[17]:df
Out[18]: 
   bar foo     new    combined
0    1   a   apple   1_a_apple
1    2   b  banana  2_b_banana
2    3   c    pear    3_c_pear

Answer 3

If you have even more columns you want to combine, using the Series method str.cat might be handy:如果您想要组合更多的列，使用 Series 方法str.cat可能会很方便：

df["combined"] = df["foo"].str.cat(df[["bar", "new"]].astype(str), sep="_")

Basically, you select the first column (if it is not already of type str , you need to append .astype(str) ), to which you append the other columns (separated by an optional separator character).基本上，您选择第一列（如果它还不是str类型，则需要附加.astype(str) ），然后将其他列附加到该列（由可选的分隔符分隔）。

Answer 4

Just wanted to make a time comparison for both solutions (for 30K rows DF):只是想对两种解决方案进行时间比较（对于 30K 行 DF）：

In [1]: df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})

In [2]: big = pd.concat([df] * 10**4, ignore_index=True)

In [3]: big.shape
Out[3]: (30000, 3)

In [4]: %timeit big.apply(lambda x:'%s_%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)
1 loop, best of 3: 881 ms per loop

In [5]: %timeit big['bar'].astype(str)+'_'+big['foo']+'_'+big['new']
10 loops, best of 3: 44.2 ms per loop

a few more options:还有一些选择：

In [6]: %timeit big.ix[:, :-1].astype(str).add('_').sum(axis=1).str.cat(big.new)
10 loops, best of 3: 72.2 ms per loop

In [11]: %timeit big.astype(str).add('_').sum(axis=1).str[:-1]
10 loops, best of 3: 82.3 ms per loop

Answer 5

The answer given by @allen is reasonably generic but can lack in performance for larger dataframes: @allen 给出的答案相当通用，但对于较大的数据帧可能缺乏性能：

Reduce does a lot better: Reduce做得更好：

from functools import reduce

import pandas as pd

# make data
df = pd.DataFrame(index=range(1_000_000))
df['1'] = 'CO'
df['2'] = 'BOB'
df['3'] = '01'
df['4'] = 'BILL'


def reduce_join(df, columns):
    assert len(columns) > 1
    slist = [df[x].astype(str) for x in columns]
    return reduce(lambda x, y: x + '_' + y, slist[1:], slist[0])


def apply_join(df, columns):
    assert len(columns) > 1
    return df[columns].apply(lambda row:'_'.join(row.values.astype(str)), axis=1)

# ensure outputs are equal
df1 = reduce_join(df, list('1234'))
df2 = apply_join(df, list('1234'))
assert df1.equals(df2)

# profile
%timeit df1 = reduce_join(df, list('1234'))  # 733 ms
%timeit df2 = apply_join(df, list('1234'))   # 8.84 s

Answer 6

Possibly the fastest solution is to operate in plain Python:可能最快的解决方案是使用纯 Python 进行操作：

Series(
    map(
        '_'.join,
        df.values.tolist()
        # when non-string columns are present:
        # df.values.astype(str).tolist()
    ),
    index=df.index
)

Comparison against @MaxU answer (using the big data frame which has both numeric and string columns):与@MaxU 答案的比较（使用同时具有数字和字符串列的big数据框）：

%timeit big['bar'].astype(str) + '_' + big['foo'] + '_' + big['new']
# 29.4 ms ± 1.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


%timeit Series(map('_'.join, big.values.astype(str).tolist()), index=big.index)
# 27.4 ms ± 2.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Comparison against @derchambers answer (using their df data frame where all columns are strings):与@derchambers 答案的比较（使用他们的df数据框，其中所有列都是字符串）：

from functools import reduce

def reduce_join(df, columns):
    slist = [df[x] for x in columns]
    return reduce(lambda x, y: x + '_' + y, slist[1:], slist[0])

def list_map(df, columns):
    return Series(
        map(
            '_'.join,
            df[columns].values.tolist()
        ),
        index=df.index
    )

%timeit df1 = reduce_join(df, list('1234'))
# 602 ms ± 39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df2 = list_map(df, list('1234'))
# 351 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Answer 7

我认为你缺少一个%s

df['combined']=df.apply(lambda x:'%s_%s_%s' % (x['bar'],x['foo'],x['new']),axis=1)

Answer 8

@derchambers I found one more solution: @derchambers 我找到了另一种解决方案：

import pandas as pd

# make data
df = pd.DataFrame(index=range(1_000_000))
df['1'] = 'CO'
df['2'] = 'BOB'
df['3'] = '01'
df['4'] = 'BILL'

def eval_join(df, columns):

    sum_elements = [f"df['{col}']" for col in columns]
    to_eval = "+ '_' + ".join(sum_elements)

    return eval(to_eval)


#profile
%timeit df3 = eval_join(df, list('1234')) # 504 ms

Answer 9

First convert the columns to str.首先将列转换为str。 Then use the .T.agg('_'.join) function to concatenate them.然后使用 .T.agg('_'.join) 函数将它们连接起来。 More info can be gotten here更多信息可以在这里获得

# Initialize columns
cols_concat = ['first_name', 'second_name']

# Convert them to type str
df[cols_concat] = df[cols_concat].astype('str')

# Then concatenate them as follows
df['new_col'] = df[cols_concat].T.agg('_'.join)

Answer 10

df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})

df['combined'] = df['foo'].astype(str)+'_'+df['bar'].astype(str)

If you concatenate with string('_') please you convert the column to string which you want and after you can concatenate the dataframe.如果您与 string('_') 连接，请将列转换为您想要的字符串，然后您可以连接数据框。

Answer 11

df['New_column_name'] = df['Column1'].map(str) + 'X' + df['Steps']

X= x 是您想要分隔两个合并列的任何分隔符（例如：空格）。

Answer 12

If you have a list of columns you want to concatenate and maybe you'd like to use some separator, here's what you can do如果您有一个要连接的列列表，并且您可能想使用一些分隔符，那么您可以执行以下操作

def concat_columns(df, cols_to_concat, new_col_name, sep=" "):
    df[new_col_name] = df[cols_to_concat[0]]
    for col in cols_to_concat[1:]:
        df[new_col_name] = df[new_col_name].astype(str) + sep + df[col].astype(str)

This should be faster than apply and takes an arbitrary number of columns to concatenate.这应该比apply更快，并且需要连接任意数量的列。

Answer 13

You could create a function which would make the implementation neater (esp. if you're using this functionality multiple times throughout an implementation):您可以创建一个使实现更整洁的函数（尤其是如果您在整个实现中多次使用此功能）：

def concat_cols(df, cols_to_concat, new_col_name, separator):  
    df[new_col_name] = ''
    for i, col in enumerate(cols_to_concat):
        df[new_col_name] += ('' if i == 0 else separator) + df[col].astype(str)
    return df

Sample usage:示例用法：

test = pd.DataFrame(data=[[1,2,3], [4,5,6], [7,8,9]], columns=['a', 'b', 'c'])
test = concat_cols(test, ['a', 'b', 'c'], 'concat_col', '_')

Answer 14

following to @Allen response关注@Allen 回复
If you need to chain such operation with other dataframe transformation, use assign :如果您需要将此类操作与其他 dataframe 转换链接，请使用assign ：

df.assign(
    combined = lambda x: x[cols].apply(
        lambda row: "_".join(row.values.astype(str)), axis=1
  )
)

Answer 15

Considering that one is combining three columns, one would need three format specifiers, '%s_%s_%s' , not just two '%s_%s' .考虑到一个组合三列，一个需要三个格式说明符'%s_%s_%s' ，而不仅仅是两个'%s_%s' 。 The following will do the work以下将完成工作

df['combined'] = df.apply(lambda x: '%s_%s_%s' % (x['foo'], x['bar'], x['new']), axis=1)

[Out]:
  foo  bar     new    combined
0   a    1   apple   a_1_apple
1   b    2  banana  b_2_banana
2   c    3    pear    c_3_pear

Alternatively, if one wants to create a separate list to store the columns that one wants to combine, the following will do the work.或者，如果想要创建一个单独的列表来存储想要组合的列，以下将完成工作。

columns = ['foo', 'bar', 'new']

df['combined'] = df.apply(lambda x: '_'.join([str(x[i]) for i in columns]), axis=1)

[Out]:
  foo  bar     new    combined
0   a    1   apple   a_1_apple
1   b    2  banana  b_2_banana
2   c    3    pear    c_3_pear

This last one is more convenient, as one can simply change or add the column names in the list - it will require less changes.最后一个更方便，因为可以简单地更改或添加列表中的列名 - 它需要的更改更少。

如何将多个列值连接到 Pandas dataframe 中的单个列中

问题描述

15 个解决方案

解决方案1
142 2018-09-11 06:53:44

解决方案2
85 2016-09-02 11:43:44

解决方案3
23 2018-05-24 08:39:07

解决方案4
17 2016-09-02 13:24:17

解决方案5
8 2020-04-17 20:32:29

解决方案6
8 2020-06-01 15:42:46

解决方案7
7 2016-09-02 11:43:28

解决方案8
3 2020-04-22 12:44:26

解决方案9
3 2022-02-28 08:32:59

解决方案10
2 2018-04-18 10:10:09

解决方案11
2 2018-10-12 13:06:49

解决方案12
2 2020-11-27 14:53:34

解决方案13
2 2021-12-02 13:03:07

解决方案14
0 2022-09-02 16:25:21

解决方案15
0 2022-09-20 09:41:53

如何将多个列值连接到 Pandas dataframe 中的单个列中

问题描述

15 个解决方案

解决方案1 142 2018-09-11 06:53:44

解决方案2 85 2016-09-02 11:43:44

解决方案3 23 2018-05-24 08:39:07

解决方案4 17 2016-09-02 13:24:17

解决方案5 8 2020-04-17 20:32:29

解决方案6 8 2020-06-01 15:42:46

解决方案7 7 2016-09-02 11:43:28

解决方案8 3 2020-04-22 12:44:26

解决方案9 3 2022-02-28 08:32:59

解决方案10 2 2018-04-18 10:10:09

解决方案11 2 2018-10-12 13:06:49

解决方案12 2 2020-11-27 14:53:34

解决方案13 2 2021-12-02 13:03:07

解决方案14 0 2022-09-02 16:25:21

解决方案15 0 2022-09-20 09:41:53

解决方案1
142 2018-09-11 06:53:44

解决方案2
85 2016-09-02 11:43:44

解决方案3
23 2018-05-24 08:39:07

解决方案4
17 2016-09-02 13:24:17

解决方案5
8 2020-04-17 20:32:29

解决方案6
8 2020-06-01 15:42:46

解决方案7
7 2016-09-02 11:43:28

解决方案8
3 2020-04-22 12:44:26

解决方案9
3 2022-02-28 08:32:59

解决方案10
2 2018-04-18 10:10:09

解决方案11
2 2018-10-12 13:06:49

解决方案12
2 2020-11-27 14:53:34

解决方案13
2 2021-12-02 13:03:07

解决方案14
0 2022-09-02 16:25:21

解决方案15
0 2022-09-20 09:41:53