简体   繁体   English

Python panda dataframe 带前缀的字符串转换为浮点数

[英]Python panda dataframe string with prefix convert to float

I have a dataframe of strings that I convert to float using df.astype('float', errors = 'ignore') .我有一个 dataframe 字符串,我使用df.astype('float', errors = 'ignore')将其转换为浮点数。

I used iloc and then have a loop to go through all columns.我使用iloc ,然后通过所有列循环到 go。

The issue is that some values in some columns has a prefix, in a given column the numbers could be ['0.02u\n', '0.1\n', '2.02n\n'.... ]问题是某些列中的某些值具有前缀,在给定列中,数字可能是['0.02u\n', '0.1\n', '2.02n\n'.... ]

The point is that u = 10^(-6) and n = 10^(-9) .关键是u = 10^(-6)n = 10^(-9) The question is how to convert this in an elegant way to a float.问题是如何以优雅的方式将其转换为浮点数。

Solution I used is to look in all cells and see if there ia 'n' or 'u' in the end of string.我使用的解决方案是查看所有单元格,看看字符串末尾是否有 ia 'n' 或 'u' 。 Remove letter, convert to float and then multiply.删除字母,转换为浮点数,然后相乘。

no_col = len(df_T.columns)
no_row = len(df_T)
for i in range(0,no_col):
   for j in range(0,no_row):
        df_T.iloc[j,i] = df_T.iloc[j,i][:-1]
for i in range(0,no_col):
   for j in range(0,no_row):
     if df_T.iloc[j,i][-1] == 'u':
         df_T.iloc[j,i] = df_T.iloc[j,i][:-1]
         df_T.iloc[j,i] = float(df_T.iloc[j,i])
         df_T.iloc[j,i] = df_T.iloc[j,i]*10**-6
     elif df_T.iloc[j,i][-1] == 'n':
         df_T.iloc[j,i] = df_T.iloc[j,i][:-1]
         df_T.iloc[j,i] = float(df_T.iloc[j,i])
         df_T.iloc[j,i] = df_T.iloc[j,i]*10**-9
     else:
         df_T.iloc[j,i] = float(df_T.iloc[j,i])
         

Pandas can evaluate expression in columns using pd.eval() . Pandas 可以使用pd.eval()评估列中的表达式。 So, if you have an expression in string format you can apply pd.eval() and it will be evaluated.所以,如果你有一个字符串格式的表达式,你可以应用pd.eval()并且它将被评估。

To use this, firstly you can remove the \n in your columns, for which I used .replace() .要使用它,首先您可以删除列中的 \n ,为此我使用了.replace() Next, to make the expression readable by pd.eval() - for example '3x' should be converted to '3*x', again used .replace with regex.接下来,要使pd.eval()可以读取表达式 - 例如,应将“3x”转换为“3*x”,再次使用.replace和正则表达式。 Finally apply pd.eval() and it will be evaluated.最后应用pd.eval()它将被评估。

import pandas as pd

df = pd.DataFrame({'col': ['0.02u\n', '0.1\n', '2.02n\n']})
u = 10^(-6)
n = 10^(-9)

# Remove the \n characters
df['col'] = df['col'].replace(to_replace="\n", value="", regex=True)
# Put '*' for multiplication --> '3x' will be converted to '3*x'
df['col'] = df['col'].replace(to_replace=r"((?:\d+)|(?:[a-zA-Z]\w*\(\w+\)))((?:[a-zA-Z]\w*)|\()",
                              value=r"\1*\2", regex=True)
df['val'] = pd.eval(df['col'])
print(df)

Result:结果:

      col   val
0  0.02*u -0.32
1     0.1  0.10
2  2.02*n -6.06

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM