简体   繁体   中英

Python panda dataframe string with prefix convert to float

I have a dataframe of strings that I convert to float using df.astype('float', errors = 'ignore') .

I used iloc and then have a loop to go through all columns.

The issue is that some values in some columns has a prefix, in a given column the numbers could be ['0.02u\n', '0.1\n', '2.02n\n'.... ]

The point is that u = 10^(-6) and n = 10^(-9) . The question is how to convert this in an elegant way to a float.

Solution I used is to look in all cells and see if there ia 'n' or 'u' in the end of string. Remove letter, convert to float and then multiply.

no_col = len(df_T.columns)
no_row = len(df_T)
for i in range(0,no_col):
   for j in range(0,no_row):
        df_T.iloc[j,i] = df_T.iloc[j,i][:-1]
for i in range(0,no_col):
   for j in range(0,no_row):
     if df_T.iloc[j,i][-1] == 'u':
         df_T.iloc[j,i] = df_T.iloc[j,i][:-1]
         df_T.iloc[j,i] = float(df_T.iloc[j,i])
         df_T.iloc[j,i] = df_T.iloc[j,i]*10**-6
     elif df_T.iloc[j,i][-1] == 'n':
         df_T.iloc[j,i] = df_T.iloc[j,i][:-1]
         df_T.iloc[j,i] = float(df_T.iloc[j,i])
         df_T.iloc[j,i] = df_T.iloc[j,i]*10**-9
     else:
         df_T.iloc[j,i] = float(df_T.iloc[j,i])
         

Pandas can evaluate expression in columns using pd.eval() . So, if you have an expression in string format you can apply pd.eval() and it will be evaluated.

To use this, firstly you can remove the \n in your columns, for which I used .replace() . Next, to make the expression readable by pd.eval() - for example '3x' should be converted to '3*x', again used .replace with regex. Finally apply pd.eval() and it will be evaluated.

import pandas as pd

df = pd.DataFrame({'col': ['0.02u\n', '0.1\n', '2.02n\n']})
u = 10^(-6)
n = 10^(-9)

# Remove the \n characters
df['col'] = df['col'].replace(to_replace="\n", value="", regex=True)
# Put '*' for multiplication --> '3x' will be converted to '3*x'
df['col'] = df['col'].replace(to_replace=r"((?:\d+)|(?:[a-zA-Z]\w*\(\w+\)))((?:[a-zA-Z]\w*)|\()",
                              value=r"\1*\2", regex=True)
df['val'] = pd.eval(df['col'])
print(df)

Result:

      col   val
0  0.02*u -0.32
1     0.1  0.10
2  2.02*n -6.06

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM