python 中 Dataframe 的連續列？

Question

我有一個使用以下代碼生成的數據框：

# importing pandas as pd 
import pandas as pd 

# Create the dataframe 
df = pd.DataFrame({'Category':['A', 'B', 'C', 'D'], 
                   'Event':['Music Theater', 'Poetry Music', 'Theatre Comedy', 'Comedy Theatre'], 
                   'Cost':[10000, 5000, 15000, 2000]}) 

# Print the dataframe 
print(df)

我希望生成一個組合所有三列的列表，並通過“_”刪除空格，並刪除所有尾隨空格：-

[A_Music_Theater_10000, B_Poetry_Music_5000,C_Theatre_Comedy_15000,D_Comedy_Theatre_2000]

我想以最優化的方式來處理它，因為運行時間對我來說是個問題。 所以要避免 for 循環。 誰能告訴我如何實現這是最優化的方式？

Answer 1

最通用的解決方案是將所有值轉換為字符串，使用join和 last replace ：

df['new'] = df.astype(str).apply('_'.join, axis=1).str.replace(' ', '_')

如果只需要過濾一些列：

cols = ['Category','Event','Cost']
df['new'] = df[cols].astype(str).apply('_'.join, axis=1).str.replace(' ', '_')

或單獨處理每一列 - 如有必要，將數字列replace並轉換為字符串：

df['new'] = (df['Category'] + '_' + 
             df['Event'].str.replace(' ', '_') + '_' + 
             df['Cost'].astype(str))

或者在轉換為字符串后添加_ 、 sum ，但在將刪除 traling _替換為rstrip后是必需的：

df['new'] = df.astype(str).add('_').sum(axis=1).str.replace(' ', '_').str.rstrip('_')

print(df) 
  Category           Event   Cost                     new
0        A   Music Theater  10000   A_Music_Theater_10000
1        B    Poetry Music   5000     B_Poetry_Music_5000
2        C  Theatre Comedy  15000  C_Theatre_Comedy_15000
3        D  Comedy Theatre   2000   D_Comedy_Theatre_2000

python 中 Dataframe 的連續列？

問題描述

1 個解決方案

解決方案1
1 已采納 2019-10-16 13:41:34

python 中 Dataframe 的連續列？

問題描述

1 個解決方案

解決方案1 1 已采納 2019-10-16 13:41:34

解決方案1
1 已采納 2019-10-16 13:41:34