Python panda dataframe 带前缀的字符串转换为浮点数

Question

I have a dataframe of strings that I convert to float using df.astype('float', errors = 'ignore') .我有一个 dataframe 字符串，我使用df.astype('float', errors = 'ignore')将其转换为浮点数。

I used iloc and then have a loop to go through all columns.我使用iloc ，然后通过所有列循环到 go。

The issue is that some values in some columns has a prefix, in a given column the numbers could be ['0.02u\n', '0.1\n', '2.02n\n'.... ]问题是某些列中的某些值具有前缀，在给定列中，数字可能是['0.02u\n', '0.1\n', '2.02n\n'.... ]

The point is that u = 10^(-6) and n = 10^(-9) .关键是u = 10^(-6)和n = 10^(-9) 。 The question is how to convert this in an elegant way to a float.问题是如何以优雅的方式将其转换为浮点数。

Answer 1

Solution I used is to look in all cells and see if there ia 'n' or 'u' in the end of string.我使用的解决方案是查看所有单元格，看看字符串末尾是否有 ia 'n' 或 'u' 。 Remove letter, convert to float and then multiply.删除字母，转换为浮点数，然后相乘。

no_col = len(df_T.columns)
no_row = len(df_T)
for i in range(0,no_col):
   for j in range(0,no_row):
        df_T.iloc[j,i] = df_T.iloc[j,i][:-1]
for i in range(0,no_col):
   for j in range(0,no_row):
     if df_T.iloc[j,i][-1] == 'u':
         df_T.iloc[j,i] = df_T.iloc[j,i][:-1]
         df_T.iloc[j,i] = float(df_T.iloc[j,i])
         df_T.iloc[j,i] = df_T.iloc[j,i]*10**-6
     elif df_T.iloc[j,i][-1] == 'n':
         df_T.iloc[j,i] = df_T.iloc[j,i][:-1]
         df_T.iloc[j,i] = float(df_T.iloc[j,i])
         df_T.iloc[j,i] = df_T.iloc[j,i]*10**-9
     else:
         df_T.iloc[j,i] = float(df_T.iloc[j,i])

Answer 2

Pandas can evaluate expression in columns using pd.eval() . Pandas 可以使用pd.eval()评估列中的表达式。 So, if you have an expression in string format you can apply pd.eval() and it will be evaluated.所以，如果你有一个字符串格式的表达式，你可以应用pd.eval()并且它将被评估。

To use this, firstly you can remove the \n in your columns, for which I used .replace() .要使用它，首先您可以删除列中的 \n ，为此我使用了.replace() 。 Next, to make the expression readable by pd.eval() - for example '3x' should be converted to '3*x', again used .replace with regex.接下来，要使pd.eval()可以读取表达式 - 例如，应将“3x”转换为“3*x”，再次使用.replace和正则表达式。 Finally apply pd.eval() and it will be evaluated.最后应用pd.eval()它将被评估。

import pandas as pd

df = pd.DataFrame({'col': ['0.02u\n', '0.1\n', '2.02n\n']})
u = 10^(-6)
n = 10^(-9)

# Remove the \n characters
df['col'] = df['col'].replace(to_replace="\n", value="", regex=True)
# Put '*' for multiplication --> '3x' will be converted to '3*x'
df['col'] = df['col'].replace(to_replace=r"((?:\d+)|(?:[a-zA-Z]\w*\(\w+\)))((?:[a-zA-Z]\w*)|\()",
                              value=r"\1*\2", regex=True)
df['val'] = pd.eval(df['col'])
print(df)

Result:结果：

      col   val
0  0.02*u -0.32
1     0.1  0.10
2  2.02*n -6.06

Python panda dataframe 带前缀的字符串转换为浮点数

问题描述

2 个解决方案

解决方案1
0 2020-07-02 07:04:53

解决方案2
0 2020-07-02 07:36:20

Python panda dataframe 带前缀的字符串转换为浮点数

问题描述

2 个解决方案

解决方案1 0 2020-07-02 07:04:53

解决方案2 0 2020-07-02 07:36:20

解决方案1
0 2020-07-02 07:04:53

解决方案2
0 2020-07-02 07:36:20