简体   繁体   中英

Pandas : Replace string column values

I have got a pandas dataframe with a cost column that I am attempting to format. Basically, replacing the string and standardizing it as cost value is pulled from different sources. There are also some 'NaN' .

Here's some sample data:

$2.75 
nan
4.150000
25.00
$4.50

I have the following code that I am using to standardize the format of values in the column.

for i in range(len(EmpComm['Cost(USD)'])):

    if (pd.isnull(EmpComm['Cost(USD)'][i])):
        print(EmpComm['Cost(USD)'][i], i)
        #EmpComm['Cost(USD)'] = EmpComm['Cost(USD)'].iloc[i].fillna(0, inplace=True)

    if type(EmpComm['Cost(USD)'].iloc[i]) == str:
       #print('string', i)
       EmpComm['Cost(USD)'] = EmpComm['Cost(USD)'].iloc[i].replace('$','')

Output:

0      2.75
1      2.75
2      2.75
3      2.75
4      2.75
5      2.75

All values are placed with 2.75. It is running the second if statement for all column values as they're formatted as a string.

My question is: How would you format it?

In general, you should avoid manual for loops and use vectorised functionality, where possible, with Pandas. Here you can utilise pd.to_numeric to test and convert values within your series:

s = pd.Series(['$2.75', np.nan, 4.150000, 25.00, '$4.50'])

strs = s.astype(str).str.replace('$', '', regex=False)
res = pd.to_numeric(strs, errors='coerce').fillna(0)

print(res)

0     2.75
1     0.00
2     4.15
3    25.00
4     4.50
dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM