I have a column called collection as follow
collection : $5,345,677, 46836214, $533,316,061, " ", 29200000
Column values have both in US dollar and without dollars. Also, it has NAN. I want to change into US Dollar in million
I used to convert as follow but not successful
df['Boxoffice in US$ (mil)'] = (df2['collection'].astype(float)/1000000).round(2).astype(str)
Getting this error: could not convert string to float: '$5,345,677'
Please advise
# remove the '$' and ',' from the strings so it can be converted to numerics
# -> notice: the series is converted to strings to handle numerics (eg. 29200000)
collection_tmp = df2['collection'].astype(str).str.replace('[$,]', '')
# convert to numerics (floats) and then to millions
# -> errors='coerce' sets NaN for invalid values
millions = pd.to_numeric(collection_tmp, errors='coerce')/1e6
# create 'Boxoffice in US$ (mil)'
df['Boxoffice in US$ (mil)'] = millions.round(2).astype('str')
You can refer to the following step:
1.Fill NAN or blank value (white space). You said it has Nan, but i saw " ".
[in ]: df['collection']
[out]: collection
0 $5,345,677
1 46836214
2 $533,316,061
3
4 29200000
[in ]: # if you have Nan, just use method `fillna` instead
# like df['collection'].fillna('0')
[in ]: df['collection'].replace(r'^\s*$', '0', regex=True)
[out]: collection
0 $5,345,677
1 46836214
2 $533,316,061
3 0
4 29200000
2.Then covert number to 'US Dollar in million'.
[in ]: df['collection'].apply(lambda x: ''.join(('$', format(int(x), ','))) if not '$' in x else x)
[out]: collection
0 $5,345,677
1 $46,836,214
2 $533,316,061
3 $0
4 $29,200,000
I do hope this can help!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.