将字符串的数据框列合并到Pandas中的单个列中

Question

I have columns in a dataframe (imported from a CSV) containing text like this. 我在包含这样的文本的数据框（从CSV导入）中有列。

"New york", "Atlanta", "Mumbai"
"Beijing", "Paris", "Budapest"
"Brussels", "Oslo", "Singapore"

I want to collapse/merge all the columns into one single column, like this 我想将所有列折叠/合并为一个列，就像这样

New york Atlanta
Beijing Paris Budapest
Brussels Oslo Singapore

How to do it in pandas? 如何在熊猫中做到这一点？

Answer 1

Suppose you have a DataFrame like so: 假设你有一个像这样的DataFrame ：

>>> df
          0        1          2
0  New york  Atlanta     Mumbai
1   Beijing    Paris   Budapest
2  Brussels     Oslo  Singapore

Then, a simple use of the pd.DataFrame.apply method will work nicely: 然后，简单地使用pd.DataFrame.apply方法将很好地工作：

>>> df.apply(" ".join, axis=1)
0    New york Atlanta Mumbai
1     Beijing Paris Budapest
2    Brussels Oslo Singapore
dtype: object

Note, I have to pass axis=1 so that it is applied across the columns, rather than down the rows. 注意，我必须传递axis=1以便它跨列应用，而不是向下行。 Ie: 即：

>>> df.apply(" ".join, axis=0)
0    New york Beijing Brussels
1           Atlanta Paris Oslo
2    Mumbai Budapest Singapore
dtype: object

Answer 2

A faster (but uglier) version is with .cat : 更快（但更丑陋）的版本是.cat ：

df[0].str.cat(df.ix[:, 1:].T.values, sep=' ')

0    New york Atlanta Mumbai
1     Beijing Paris Budapest
2    Brussels Oslo Singapore
Name: 0, dtype: object

On a larger (10kx5) DataFrame: 在更大的（10kx5）DataFrame上：

%timeit df.apply(" ".join, axis=1)
10 loops, best of 3: 112 ms per loop

%timeit df[0].str.cat(df.ix[:, 1:].T.values, sep=' ')
100 loops, best of 3: 4.48 ms per loop

Answer 3

Here are a couple more ways: 这里有几种方法：

def pir(df):
    df = df.copy()
    df.insert(2, 's', ' ', 1)
    df.insert(1, 's', ' ', 1)
    return df.sum(1)

def pir2(df):
    df = df.copy()
    return pd.MultiIndex.from_arrays(df.values.T).to_series().str.join(' ').reset_index(drop=True)

def pir3(df):
    a = df.values[:, 0].copy()
    for j in range(1, df.shape[1]):
        a += ' ' + df.values[:, j]
    return pd.Series(a)

Timing 定时

pir3 seems fastest over small df pir3似乎比小df快

pir3 still fastest over larger df 30,000 rows pir3仍然比30,000行更大的df 更快

Answer 4

If you prefer something more explicit... 如果你更喜欢更明确的东西......

Starting with a dataframe df that looks like this: 从数据框df开始，如下所示：

>>> df
          A         B          C
0  New york   Beijing   Brussels
1   Atlanta     Paris       Oslo
2    Mumbai  Budapest  Singapore

You can create a new column like this: 您可以像这样创建一个新列：

df['result'] = df['A'] + ' ' + df['B'] + ' ' + df['C']

In this case the result is stored in the 'result' column of the original DataFrame: 在这种情况下，结果存储在原始DataFrame的“结果”列中：

          A         B          C                     result
0  New york   Beijing   Brussels  New york Beijing Brussels
1   Atlanta     Paris       Oslo         Atlanta Paris Oslo
2    Mumbai  Budapest  Singapore  Mumbai Budapest Singapore

Answer 5

for the sake of completeness: 为了完整起见：

In [160]: df1.add([' '] * (df1.columns.size - 1) + ['']).sum(axis=1)
Out[160]:
0    New york Atlanta Mumbai
1     Beijing Paris Budapest
2    Brussels Oslo Singapore
dtype: object

Explanation: 说明：

In [162]: [' '] * (df.columns.size - 1) + ['']
Out[162]: [' ', ' ', '']

Timing against 300K rows DF: 针对300K行DF的时序：

In [68]: df = pd.concat([df] * 10**5, ignore_index=True)

In [69]: df.shape
Out[69]: (300000, 3)

In [76]: %timeit df.apply(" ".join, axis=1)
1 loop, best of 3: 5.8 s per loop

In [77]: %timeit df[0].str.cat(df.ix[:, 1:].T.values, sep=' ')
10 loops, best of 3: 138 ms per loop

In [79]: %timeit pir(df)
1 loop, best of 3: 499 ms per loop

In [80]: %timeit pir2(df)
10 loops, best of 3: 174 ms per loop

In [81]: %timeit pir3(df)
10 loops, best of 3: 115 ms per loop

In [159]: %timeit df.add([' '] * (df.columns.size - 1) + ['']).sum(axis=1)
1 loop, best of 3: 478 ms per loop

Conclusion: current winner is @piRSquared's pir3() 结论：目前的赢家是@ piRSquared的pir3（）

将字符串的数据框列合并到Pandas中的单个列中

问题描述

5 个解决方案

解决方案1
4 2016-07-24 08:17:11

解决方案2
4 2016-07-24 08:39:45

解决方案3
4 2016-07-24 09:48:49

Timing 定时

解决方案4
2 2016-07-24 11:10:46

解决方案5
1 2016-07-24 10:46:38

将字符串的数据框列合并到Pandas中的单个列中

问题描述

5 个解决方案

解决方案1 4 2016-07-24 08:17:11

解决方案2 4 2016-07-24 08:39:45

解决方案3 4 2016-07-24 09:48:49

Timing 定时

解决方案4 2 2016-07-24 11:10:46

解决方案5 1 2016-07-24 10:46:38

解决方案1
4 2016-07-24 08:17:11

解决方案2
4 2016-07-24 08:39:45

解决方案3
4 2016-07-24 09:48:49

解决方案4
2 2016-07-24 11:10:46

解决方案5
1 2016-07-24 10:46:38