[英]How to combine/merge Columns within the same Dataframe in Pandas?
我有一個與此相似的數據框:
0 1 2 3 4 5
0 1001 1 176 REMAINING US SOUTH
1 1002 1 176 REMAINING US SOUTH
我想做的是將第3,4和5列合並以創建包含第3,4和5列中所有數據的列。
所需的輸出:
0 1 2 3
0 1001 1 176 REMAINING US SOUTH
1 1002 1 176 REMAINING US SOUTH
我已經嘗試過了
hbadef['6'] = hbadef[['3', '4', '5']].apply(lambda x: ''.join(x), axis=1)
那沒有解決。
這是我實現時的堆棧跟蹤
hbadef['3'] = hbadef['3'] + ' ' + hbadef['4'] + ' ' + hbadef['5']
堆棧跟蹤:
TypeError Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2524 try:
-> 2525 return self._engine.get_loc(key)
2526 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
KeyError: '3'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-62-2da6c35d6e89> in <module>()
----> 1 hbadef['3'] = hbadef['3'] + ' ' + hbadef['4'] + ' ' + hbadef['5']
2 # hbadef.drop(['4', '5'], axis=1)
3 # hbadef.columns = ['MKTcode', 'Region']
4
5 # pd.concat(
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2137 return self._getitem_multilevel(key)
2138 else:
-> 2139 return self._getitem_column(key)
2140
2141 def _getitem_column(self, key):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2144 # get column
2145 if self.columns.is_unique:
-> 2146 return self._get_item_cache(key)
2147
2148 # duplicate columns & possible reduce dimensionality
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1840 res = cache.get(item)
1841 if res is None:
-> 1842 values = self._data.get(item)
1843 res = self._box_item_values(item, values)
1844 cache[item] = res
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
3841
3842 if not isna(item):
-> 3843 loc = self.items.get_loc(item)
3844 else:
3845 indexer = np.arange(len(self.items))[isna(self.items)]
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2525 return self._engine.get_loc(key)
2526 except KeyError:
-> 2527 return self._engine.get_loc(self._maybe_cast_indexer(key))
2528
2529 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
KeyError: '3'
我嘗試刪除NaN值,但得到類似的結果。 我對為什么這樣一個簡單的功能無法正常工作感到困惑。
我將接受一個答案,以便我們可以“關閉”該問題。 這兩個答案都是可以接受的,可以解決問題,我遇到的問題很可能是應用程序錯誤,我必須獨立於該問題解決。
您可以簡單地添加
hbadef['3'] += ' ' + hbadef['4'] + ' ' + hbadef['5']
然后刪除不需要的列
hbadef.drop(['4', '5'], axis=1, inplace=True)
>>> hbadef
0 1 2 3
0 1001 1 176 REMAINING US SOUTH
1 1002 1 176 REMAINING US SOUTH
注意:如果您的欄是整數,請改用
hbadef.loc[:, 3] += ' ' + hbadef.loc[:, 4] + ' ' + hbadef.loc[:, 5]
hbadef.drop([4, 5], axis=1, inplace=True)
使用concat
+ agg
pd.concat(
[df.iloc[:, :3], df.iloc[:, 3:].agg(' '.join, axis=1)],
axis=1,
ignore_index=True
)
0 1 2 3
0 1001 1 176 REMAINING US SOUTH
1 1002 1 176 REMAINING US SOUTH
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.