如何在Pandas的同一数据框中合并/合并列？

Question

I have a data frame similar to this: 我有一个与此相似的数据框：

       0    1   2   3           4   5
0   1001    1   176 REMAINING   US  SOUTH
1   1002    1   176 REMAINING   US  SOUTH

What I would like to do is to combine columns 3,4, and 5 to create on column that has all of the data in columns 3,4, and 5. 我想做的是将第3,4和5列合并以创建包含第3,4和5列中所有数据的列。

Desired output: 所需的输出：

       0    1   2   3           
0   1001    1   176 REMAINING US SOUTH
1   1002    1   176 REMAINING US SOUTH

I've already tried 我已经尝试过了

hbadef['6'] = hbadef[['3', '4', '5']].apply(lambda x: ''.join(x), axis=1)

and that didn't work out. 那没有解决。

Here is the stacktrace when I implement 这是我实现时的堆栈跟踪

 hbadef['3'] = hbadef['3'] + ' ' +  hbadef['4'] + ' ' + hbadef['5']

Stacktrace: 堆栈跟踪：

TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2524             try:
-> 2525                 return self._engine.get_loc(key)
   2526             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: '3'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-62-2da6c35d6e89> in <module>()
----> 1 hbadef['3'] = hbadef['3'] + ' ' +  hbadef['4'] + ' ' + hbadef['5']
      2 # hbadef.drop(['4', '5'], axis=1)
      3 # hbadef.columns = ['MKTcode', 'Region']
      4 
      5 # pd.concat(

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2137             return self._getitem_multilevel(key)
   2138         else:
-> 2139             return self._getitem_column(key)
   2140 
   2141     def _getitem_column(self, key):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2144         # get column
   2145         if self.columns.is_unique:
-> 2146             return self._get_item_cache(key)
   2147 
   2148         # duplicate columns & possible reduce dimensionality

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1840         res = cache.get(item)
   1841         if res is None:
-> 1842             values = self._data.get(item)
   1843             res = self._box_item_values(item, values)
   1844             cache[item] = res

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   3841 
   3842             if not isna(item):
-> 3843                 loc = self.items.get_loc(item)
   3844             else:
   3845                 indexer = np.arange(len(self.items))[isna(self.items)]

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2525                 return self._engine.get_loc(key)
   2526             except KeyError:
-> 2527                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2528 
   2529         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: '3'

I've tried removing the NaN values, but I get a similar result. 我尝试删除NaN值，但得到类似的结果。 I am perplexed as to why such a simple function is not working properly. 我对为什么这样一个简单的功能无法正常工作感到困惑。

I'll be accepting an answer so that we can sorta "close" this question. 我将接受一个答案，以便我们可以“关闭”该问题。 Both of the answers are acceptable and solve the problem, the problem that I'm running into is likely an application error that I will have to solve independently from this question. 这两个答案都是可以接受的，可以解决问题，我遇到的问题很可能是应用程序错误，我必须独立于该问题解决。

Answer 1

You can simply add 您可以简单地添加

hbadef['3'] += ' ' +  hbadef['4'] + ' ' + hbadef['5']

then drop the unneeded columns 然后删除不需要的列

hbadef.drop(['4', '5'], axis=1, inplace=True)
>>> hbadef
    0   1   2   3
0   1001    1   176 REMAINING US SOUTH
1   1002    1   176 REMAINING US SOUTH

Note: If your columns are integer, then use instead 注意：如果您的栏是整数，请改用

hbadef.loc[:, 3] += ' ' + hbadef.loc[:, 4] + ' ' + hbadef.loc[:, 5]
hbadef.drop([4, 5], axis=1, inplace=True)

Answer 2

Use concat + agg 使用concat + agg

pd.concat(
    [df.iloc[:, :3], df.iloc[:, 3:].agg(' '.join, axis=1)], 
    axis=1, 
    ignore_index=True
)

      0  1    2                   3
0  1001  1  176  REMAINING US SOUTH
1  1002  1  176  REMAINING US SOUTH

如何在Pandas的同一数据框中合并/合并列？

问题描述

2 个解决方案

解决方案1
1 2018-05-02 16:13:28

解决方案2
1 已采纳 2018-05-02 16:14:06

如何在Pandas的同一数据框中合并/合并列？

问题描述

2 个解决方案

解决方案1 1 2018-05-02 16:13:28

解决方案2 1 已采纳 2018-05-02 16:14:06

解决方案1
1 2018-05-02 16:13:28

解决方案2
1 已采纳 2018-05-02 16:14:06