简体   繁体   中英

How to convert the column in US million dollars in pandas?

I have a column called collection as follow

collection : $5,345,677, 46836214, $533,316,061, " ", 29200000

Column values have both in US dollar and without dollars. Also, it has NAN. I want to change into US Dollar in million

I used to convert as follow but not successful

df['Boxoffice in US$ (mil)'] = (df2['collection'].astype(float)/1000000).round(2).astype(str)

Getting this error: could not convert string to float: '$5,345,677'

Please advise

# remove the '$' and ',' from the strings so it can be converted to numerics
# -> notice: the series is converted to strings to handle numerics (eg. 29200000)
collection_tmp = df2['collection'].astype(str).str.replace('[$,]', '')
# convert to numerics (floats) and then to millions
# -> errors='coerce' sets NaN for invalid values
millions = pd.to_numeric(collection_tmp, errors='coerce')/1e6
# create 'Boxoffice in US$ (mil)'
df['Boxoffice in US$ (mil)'] = millions.round(2).astype('str')

You can refer to the following step:

1.Fill NAN or blank value (white space). You said it has Nan, but i saw " ".

[in ]: df['collection']
[out]: collection
  0    $5,345,677
  1    46836214
  2    $533,316,061
  3      
  4    29200000
[in ]: # if you have Nan, just use method `fillna` instead 
       # like df['collection'].fillna('0')
[in ]: df['collection'].replace(r'^\s*$', '0', regex=True)
[out]: collection
  0    $5,345,677
  1    46836214
  2    $533,316,061
  3    0
  4    29200000

2.Then covert number to 'US Dollar in million'.

[in ]: df['collection'].apply(lambda x: ''.join(('$', format(int(x), ','))) if not '$' in x else x)
[out]: collection
  0    $5,345,677
  1    $46,836,214
  2    $533,316,061
  3    $0
  4    $29,200,000

I do hope this can help!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM