如何遍历数据框df中的选定列？

Question

I have a DataFrame, df with 3775 rows × 8 columns.我有一个 DataFrame，df，有 3775 行 × 8 列。

My df.columns is我的df.columns是

Index(['FY', 'Month', 'Sales Area', 'BSP Agent', 'City Booking Office','General Sales Agent', 'Web Sales', 'Total'] of dtype='object'

For columns df[['BSP Agent', 'City Booking Office', 'General Sales Agent', 'Web Sales', 'Total']] , I want to do the following: Remove spaces from columns and then convert the objects to numeric.对于列df[['BSP Agent', 'City Booking Office', 'General Sales Agent', 'Web Sales', 'Total']] ，我想执行以下操作：从列中删除空格，然后将对象转换为数字。

I used a for loop but it is very confusing我使用了for loop但它非常令人困惑

for e in df_c:
    df_c[e] = df_c[e].replace(' ', '', regex=True)
    df_c[e] = pd.to_numeric(df_c[e], errors='coerce').fillna(0, downcast='infer')
    break

Is there a better way to achieve my task?有没有更好的方法来完成我的任务？

Answer 1

You can avoid the for loop by doing:您可以通过执行以下操作来避免 for 循环：

c = ['FY', 'Month', 'Sales Area', 'BSP Agent', 'City Booking Office',
     'General Sales Agent', 'Web Sales', 'Total']

df[c] = df[c].apply(lambda x: x.str.replace(' ',  ''))
df[c]= df[c].apply(pd.to_numeric, errors = 'coerce').fillna(0, downcast='infer')

Answer 2

If create DataFrame from file and spaces are for thousands, best solution is use thousands parameter in read_csv , then is also correct columns converted to numeric:如果从文件创建 DataFrame 并且空间为数千个，最好的解决方案是在read_csv使用thousands参数，然后将正确的列转换为数字：

df = pd.read_csv(file, thousands=' ')

Use DataFrame.replace and DataFrame.fillna with all selected columns by list and for convert to numeric use DataFrame.apply :使用DataFrame.replace和DataFrame.fillna通过列表和转换为数值使用所有选定列DataFrame.apply ：

cols = ['BSP Agent', 'City Booking Office', 'General Sales Agent', 'Web Sales', 'Total']

df[cols] = (df[cols].replace(' ', '', regex=True)
                    .apply(lambda x: pd.to_numeric(x, errors = 'coerce'))
                    .fillna(0, downcast='infer'))

Sample :样品：

np.random.seed(123)

c =['FY', 'Month', 'Sales Area', 'BSP Agent', 
     'City Booking Office','General Sales Agent', 'Web Sales', 'Total']
cols = ['BSP Agent', 'City Booking Office', 'General Sales Agent', 'Web Sales', 'Total']

df = (pd.DataFrame(np.random.rand(5, len(cols)) * 10000, columns=cols)
         .astype(int)
         .applymap(lambda x: '{:,}'.format(x).replace(',', ' '))
         .reindex(c, axis=1, fill_value='data'))
print (df)
     FY Month Sales Area BSP Agent City Booking Office General Sales Agent  \
0  data  data       data     6 964               2 861               2 268   
1  data  data       data     4 231               9 807               6 848   
2  data  data       data     3 431               7 290               4 385   
3  data  data       data     7 379               1 824               1 754   
4  data  data       data     6 344               8 494               7 244   

  Web Sales  Total  
0     5 513  7 194  
1     4 809  3 921  
2       596  3 980  
3     5 315  5 318  
4     6 110  7 224

cols = ['BSP Agent', 'City Booking Office', 'General Sales Agent', 'Web Sales', 'Total']

df[cols] = (df[cols].replace(' ', '', regex=True)
                    .apply(lambda x: pd.to_numeric(x, errors = 'coerce'))
                    .fillna(0, downcast='infer'))
print (df)
      FY Month Sales Area  BSP Agent  City Booking Office  General Sales Agent  \
0  data  data       data       6964                 2861                 2268   
1  data  data       data       4231                 9807                 6848   
2  data  data       data       3431                 7290                 4385   
3  data  data       data       7379                 1824                 1754   
4  data  data       data       6344                 8494                 7244   

   Web Sales  Total  
0       5513   7194  
1       4809   3921  
2        596   3980  
3       5315   5318  
4       6110   7224

如何遍历数据框df中的选定列？

问题描述

2 个解决方案

解决方案1
2 2020-03-17 10:58:05

解决方案2
2 已采纳 2020-03-17 10:58:35

如何遍历数据框df中的选定列？

问题描述

2 个解决方案

解决方案1 2 2020-03-17 10:58:05

解决方案2 2 已采纳 2020-03-17 10:58:35

解决方案1
2 2020-03-17 10:58:05

解决方案2
2 已采纳 2020-03-17 10:58:35