I have a dataframe that contains numbers represented as strings which uses the comma separator (eg 150,000). There are also some values that are represented by "-".
I'm trying to convert all the numbers that are represented as strings into a float number. The "-" will remain as it is.
My current code uses a for loop to iterate each column and row to see if each cell has a comma. If so, it removes the comma then converts it to a number.
This works fine most of the time except some of the dataframes have duplicated column names and that's when it falls apart.
Is there a more efficient way of doing this update (ie not using loops) and also avoid the problem when there are duplicated column names?
Current code:
for col in statement_df.columns:
row = 0
while row < len(statement_df.index):
row_name = statement_df.index[row]
if statement_df[col][row] == "-":
#do nothing
print(statement_df[col][row])
elif statement_df[col][row].find(",") >= 0:
#statement_df.loc[col][row] = float(statement_df[col][row].replace(",",""))
x = float(statement_df[col][row].replace(",",""))
statement_df.at[row_name, col] = x
print(statement_df[col][row])
else:
x = float(statement_df[col][row])
statement_df.at[row_name, col] = x
print(statement_df[col][row])
row = row + 1
Use str.replace(',', '')
on dataframe itself
For a dataframe like below
Name Count
Josh 12,33
Eric 24,57
Dany 9,678
apply like these
df['Count'] = df['Count'].str.replace(',', '')
df
It will give you the following output
Name Count
0 Josh 1233
1 Eric 2457
2 Dany 9678
You can use iloc
functionality for that -
for idx in range(len(df.columns)):
df.iloc[:, idx] = df.iloc[:, idx].apply(your_function)
The code in your_function
should be able to deal with input from one row. For example -
def your_function(x):
if x == ',': return 0
return float(x)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.