[英]How to combine/merge Columns within the same Dataframe in Pandas?
I have a data frame similar to this: 我有一个与此相似的数据框:
0 1 2 3 4 5
0 1001 1 176 REMAINING US SOUTH
1 1002 1 176 REMAINING US SOUTH
What I would like to do is to combine columns 3,4, and 5 to create on column that has all of the data in columns 3,4, and 5. 我想做的是将第3,4和5列合并以创建包含第3,4和5列中所有数据的列。
Desired output: 所需的输出:
0 1 2 3
0 1001 1 176 REMAINING US SOUTH
1 1002 1 176 REMAINING US SOUTH
I've already tried 我已经尝试过了
hbadef['6'] = hbadef[['3', '4', '5']].apply(lambda x: ''.join(x), axis=1)
and that didn't work out. 那没有解决。
Here is the stacktrace when I implement 这是我实现时的堆栈跟踪
hbadef['3'] = hbadef['3'] + ' ' + hbadef['4'] + ' ' + hbadef['5']
Stacktrace: 堆栈跟踪:
TypeError Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2524 try:
-> 2525 return self._engine.get_loc(key)
2526 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
KeyError: '3'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-62-2da6c35d6e89> in <module>()
----> 1 hbadef['3'] = hbadef['3'] + ' ' + hbadef['4'] + ' ' + hbadef['5']
2 # hbadef.drop(['4', '5'], axis=1)
3 # hbadef.columns = ['MKTcode', 'Region']
4
5 # pd.concat(
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2137 return self._getitem_multilevel(key)
2138 else:
-> 2139 return self._getitem_column(key)
2140
2141 def _getitem_column(self, key):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2144 # get column
2145 if self.columns.is_unique:
-> 2146 return self._get_item_cache(key)
2147
2148 # duplicate columns & possible reduce dimensionality
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1840 res = cache.get(item)
1841 if res is None:
-> 1842 values = self._data.get(item)
1843 res = self._box_item_values(item, values)
1844 cache[item] = res
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
3841
3842 if not isna(item):
-> 3843 loc = self.items.get_loc(item)
3844 else:
3845 indexer = np.arange(len(self.items))[isna(self.items)]
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2525 return self._engine.get_loc(key)
2526 except KeyError:
-> 2527 return self._engine.get_loc(self._maybe_cast_indexer(key))
2528
2529 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
KeyError: '3'
I've tried removing the NaN values, but I get a similar result. 我尝试删除NaN值,但得到类似的结果。 I am perplexed as to why such a simple function is not working properly.
我对为什么这样一个简单的功能无法正常工作感到困惑。
I'll be accepting an answer so that we can sorta "close" this question. 我将接受一个答案,以便我们可以“关闭”该问题。 Both of the answers are acceptable and solve the problem, the problem that I'm running into is likely an application error that I will have to solve independently from this question.
这两个答案都是可以接受的,可以解决问题,我遇到的问题很可能是应用程序错误,我必须独立于该问题解决。
You can simply add 您可以简单地添加
hbadef['3'] += ' ' + hbadef['4'] + ' ' + hbadef['5']
then drop the unneeded columns 然后删除不需要的列
hbadef.drop(['4', '5'], axis=1, inplace=True)
>>> hbadef
0 1 2 3
0 1001 1 176 REMAINING US SOUTH
1 1002 1 176 REMAINING US SOUTH
Note: If your columns are integer, then use instead 注意:如果您的栏是整数,请改用
hbadef.loc[:, 3] += ' ' + hbadef.loc[:, 4] + ' ' + hbadef.loc[:, 5]
hbadef.drop([4, 5], axis=1, inplace=True)
Use concat
+ agg
使用
concat
+ agg
pd.concat(
[df.iloc[:, :3], df.iloc[:, 3:].agg(' '.join, axis=1)],
axis=1,
ignore_index=True
)
0 1 2 3
0 1001 1 176 REMAINING US SOUTH
1 1002 1 176 REMAINING US SOUTH
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.