[英]Python panda dataframe string with prefix convert to float
I have a dataframe of strings that I convert to float using df.astype('float', errors = 'ignore')
.我有一个 dataframe 字符串,我使用
df.astype('float', errors = 'ignore')
将其转换为浮点数。
I used iloc
and then have a loop to go through all columns.我使用
iloc
,然后通过所有列循环到 go。
The issue is that some values in some columns has a prefix, in a given column the numbers could be ['0.02u\n', '0.1\n', '2.02n\n'.... ]
问题是某些列中的某些值具有前缀,在给定列中,数字可能是
['0.02u\n', '0.1\n', '2.02n\n'.... ]
The point is that u = 10^(-6)
and n = 10^(-9)
.关键是
u = 10^(-6)
和n = 10^(-9)
。 The question is how to convert this in an elegant way to a float.问题是如何以优雅的方式将其转换为浮点数。
Solution I used is to look in all cells and see if there ia 'n' or 'u' in the end of string.我使用的解决方案是查看所有单元格,看看字符串末尾是否有 ia 'n' 或 'u' 。 Remove letter, convert to float and then multiply.
删除字母,转换为浮点数,然后相乘。
no_col = len(df_T.columns)
no_row = len(df_T)
for i in range(0,no_col):
for j in range(0,no_row):
df_T.iloc[j,i] = df_T.iloc[j,i][:-1]
for i in range(0,no_col):
for j in range(0,no_row):
if df_T.iloc[j,i][-1] == 'u':
df_T.iloc[j,i] = df_T.iloc[j,i][:-1]
df_T.iloc[j,i] = float(df_T.iloc[j,i])
df_T.iloc[j,i] = df_T.iloc[j,i]*10**-6
elif df_T.iloc[j,i][-1] == 'n':
df_T.iloc[j,i] = df_T.iloc[j,i][:-1]
df_T.iloc[j,i] = float(df_T.iloc[j,i])
df_T.iloc[j,i] = df_T.iloc[j,i]*10**-9
else:
df_T.iloc[j,i] = float(df_T.iloc[j,i])
Pandas can evaluate expression in columns using pd.eval()
. Pandas 可以使用
pd.eval()
评估列中的表达式。 So, if you have an expression in string format you can apply pd.eval()
and it will be evaluated.所以,如果你有一个字符串格式的表达式,你可以应用
pd.eval()
并且它将被评估。
To use this, firstly you can remove the \n in your columns, for which I used .replace()
.要使用它,首先您可以删除列中的 \n ,为此我使用了
.replace()
。 Next, to make the expression readable by pd.eval()
- for example '3x' should be converted to '3*x', again used .replace
with regex.接下来,要使
pd.eval()
可以读取表达式 - 例如,应将“3x”转换为“3*x”,再次使用.replace
和正则表达式。 Finally apply pd.eval()
and it will be evaluated.最后应用
pd.eval()
它将被评估。
import pandas as pd
df = pd.DataFrame({'col': ['0.02u\n', '0.1\n', '2.02n\n']})
u = 10^(-6)
n = 10^(-9)
# Remove the \n characters
df['col'] = df['col'].replace(to_replace="\n", value="", regex=True)
# Put '*' for multiplication --> '3x' will be converted to '3*x'
df['col'] = df['col'].replace(to_replace=r"((?:\d+)|(?:[a-zA-Z]\w*\(\w+\)))((?:[a-zA-Z]\w*)|\()",
value=r"\1*\2", regex=True)
df['val'] = pd.eval(df['col'])
print(df)
Result:结果:
col val
0 0.02*u -0.32
1 0.1 0.10
2 2.02*n -6.06
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.