如何消除十进制值中的噪音（冗余逗号/点） - Python

Question

I have a dataset df with two columns ID and Value .我有一个数据集df ，其中包含两列ID和Value 。 Both are of Dtype "object".两者都是 Dtype“对象”。 However, I would like to convert the column Value to Dtype "double" with a dot as decimal separator.但是，我想将列Value转换为 Dtype “double”，并用点作为小数点分隔符。 The problem is that the values of this column contain noise due to the presence of too many commas (eg 0,1,,) - or after replacement too many dots (eg 0.1..).问题是该列的值由于存在太多逗号（例如 0,1,,）或替换后的点太多（例如 0.1..）而包含噪音。 As a result, when I try to convert the Dtype to double, I get the error message: could not convert string to float: '0.2.'结果，当我尝试将 Dtype 转换为 double 时，我收到错误消息： could not convert string to float: '0.2.'

Example code:示例代码：

#required packages
import pandas as pd
import numpy as np
  
# initialize list of lists
data = [[1, '0,1'], [2, '0,2,'], [3, '0,01,,']]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['ID', 'Value'])

#replace comma with dot as separator
df = df.replace(',', '.', regex=True)

#examine dtype per column
df.info()

#convert dtype from object to double
df = df.astype({'Value': np.double}) #this is where the error message appears

The preferred outcome is to have the values within the column Value as 0.1 , 0.2 and 0.01 respectively.首选结果是将列Value中的值分别设为0.1 、 0.2和0.01 。

How can I get rid of the redundant commas or, after replacement, dots in the values of the column Values ?如何摆脱多余的逗号，或者在替换后，列Values的值中的点？

Answer 1

One option: use string functions to convert and strip the values.一种选择：使用字符串函数来转换和剥离值。 For example:例如：

#required packages                                                                  
import pandas as pd                                                                 
import numpy as np                                                                  
                                                                                    
# initialize list of lists                                                          
data = [[1, '0,1'], [2, '0,2,'], [3, '0,01,,']]                                     
                                                                                    
# Create the pandas DataFrame                                                       
df = pd.DataFrame(data, columns=['ID', 'Value'])                                    
                                                                                    
#replace comma with dot as separator                                                
df['Value'] = df['Value'].str.replace(',', '.', 1).str.rstrip(',') 
                                                                                    
#examine dtype per column                                                           
df.info()                                                                           
                                                                                    
#convert dtype from object to double                                                
df = df.astype({'Value': np.double})
      
print("------ df:")                                                                              
print(df)

prints:印刷：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   ID      3 non-null      int64 
 1   Value   3 non-null      object
dtypes: int64(1), object(1)
memory usage: 176.0+ bytes
----- df:
   ID  Value
0   1   0.10
1   2   0.20
2   3   0.01

如何消除十进制值中的噪音（冗余逗号/点） - Python

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-09-16 12:53:11

如何消除十进制值中的噪音（冗余逗号/点） - Python

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-09-16 12:53:11

解决方案1
2 已采纳 2022-09-16 12:53:11