[英]How to iterate through selected columns in a dataframe, df?
I have a DataFrame, df with 3775 rows × 8 columns.我有一个 DataFrame,df,有 3775 行 × 8 列。
My df.columns
is我的
df.columns
是
Index(['FY', 'Month', 'Sales Area', 'BSP Agent', 'City Booking Office','General Sales Agent', 'Web Sales', 'Total'] of dtype='object'
For columns df[['BSP Agent', 'City Booking Office', 'General Sales Agent', 'Web Sales', 'Total']]
, I want to do the following: Remove spaces from columns and then convert the objects to numeric.对于列
df[['BSP Agent', 'City Booking Office', 'General Sales Agent', 'Web Sales', 'Total']]
,我想执行以下操作:从列中删除空格,然后将对象转换为数字。
I used a for loop
but it is very confusing我使用了
for loop
但它非常令人困惑
for e in df_c:
df_c[e] = df_c[e].replace(' ', '', regex=True)
df_c[e] = pd.to_numeric(df_c[e], errors='coerce').fillna(0, downcast='infer')
break
Is there a better way to achieve my task?有没有更好的方法来完成我的任务?
You can avoid the for loop by doing:您可以通过执行以下操作来避免 for 循环:
c = ['FY', 'Month', 'Sales Area', 'BSP Agent', 'City Booking Office',
'General Sales Agent', 'Web Sales', 'Total']
df[c] = df[c].apply(lambda x: x.str.replace(' ', ''))
df[c]= df[c].apply(pd.to_numeric, errors = 'coerce').fillna(0, downcast='infer')
If create DataFrame from file and spaces are for thousands, best solution is use thousands
parameter in read_csv
, then is also correct columns converted to numeric:如果从文件创建 DataFrame 并且空间为数千个,最好的解决方案是在
read_csv
使用thousands
参数,然后将正确的列转换为数字:
df = pd.read_csv(file, thousands=' ')
Use DataFrame.replace
and DataFrame.fillna
with all selected columns by list and for convert to numeric use DataFrame.apply
:使用
DataFrame.replace
和DataFrame.fillna
通过列表和转换为数值使用所有选定列DataFrame.apply
:
cols = ['BSP Agent', 'City Booking Office', 'General Sales Agent', 'Web Sales', 'Total']
df[cols] = (df[cols].replace(' ', '', regex=True)
.apply(lambda x: pd.to_numeric(x, errors = 'coerce'))
.fillna(0, downcast='infer'))
Sample :样品:
np.random.seed(123)
c =['FY', 'Month', 'Sales Area', 'BSP Agent',
'City Booking Office','General Sales Agent', 'Web Sales', 'Total']
cols = ['BSP Agent', 'City Booking Office', 'General Sales Agent', 'Web Sales', 'Total']
df = (pd.DataFrame(np.random.rand(5, len(cols)) * 10000, columns=cols)
.astype(int)
.applymap(lambda x: '{:,}'.format(x).replace(',', ' '))
.reindex(c, axis=1, fill_value='data'))
print (df)
FY Month Sales Area BSP Agent City Booking Office General Sales Agent \
0 data data data 6 964 2 861 2 268
1 data data data 4 231 9 807 6 848
2 data data data 3 431 7 290 4 385
3 data data data 7 379 1 824 1 754
4 data data data 6 344 8 494 7 244
Web Sales Total
0 5 513 7 194
1 4 809 3 921
2 596 3 980
3 5 315 5 318
4 6 110 7 224
cols = ['BSP Agent', 'City Booking Office', 'General Sales Agent', 'Web Sales', 'Total']
df[cols] = (df[cols].replace(' ', '', regex=True)
.apply(lambda x: pd.to_numeric(x, errors = 'coerce'))
.fillna(0, downcast='infer'))
print (df)
FY Month Sales Area BSP Agent City Booking Office General Sales Agent \
0 data data data 6964 2861 2268
1 data data data 4231 9807 6848
2 data data data 3431 7290 4385
3 data data data 7379 1824 1754
4 data data data 6344 8494 7244
Web Sales Total
0 5513 7194
1 4809 3921
2 596 3980
3 5315 5318
4 6110 7224
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.