[英]Converting string variable with double commas into float?
I have some strings in a column which originally uses commas as separators from thousands and from decimals and I need to convert this string into a float, how can I do it?我在一列中有一些字符串,最初使用逗号作为千位和小数的分隔符,我需要将该字符串转换为浮点数,我该怎么做?
I firstly tried to replace all the commas for dots:我首先尝试将所有逗号替换为点:
df['min'] = df['min'].str.replace(',', '.')
and tried to convert into float:并试图转换成浮点数:
df['min']= df['min'].astype(float)
but it returned me the following error:但它返回了以下错误:
ValueError Traceback (most recent call last)
<ipython-input-29-5716d326493c> in <module>
----> 1 df['min']= df['min'].astype(float)
2 #df['mcom']= df['mcom'].astype(float)
3 #df['max']= df['max'].astype(float)
~\anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors)
5544 else:
5545 # else, only a single dtype is given
-> 5546 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
5547 return self._constructor(new_data).__finalize__(self, method="astype")
5548
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, copy, errors)
593 self, dtype, copy: bool = False, errors: str = "raise"
594 ) -> "BlockManager":
--> 595 return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
596
597 def convert(
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, align_keys, **kwargs)
404 applied = b.apply(f, **kwargs)
405 else:
--> 406 applied = getattr(b, f)(**kwargs)
407 result_blocks = _extend_blocks(applied, result_blocks)
408
~\anaconda3\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors)
593 vals1d = values.ravel()
594 try:
--> 595 values = astype_nansafe(vals1d, dtype, copy=True)
596 except (ValueError, TypeError):
597 # e.g. astype_nansafe can fail on object-dtype of strings
~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)
993 if copy or is_object_dtype(arr) or is_object_dtype(dtype):
994 # Explicit copy, or required since NumPy can't view from / to object.
--> 995 return arr.astype(dtype, copy=True)
996
997 return arr.view(dtype)
ValueError: could not convert string to float: '1.199.75'
If it is possible, I would like to remove all dots and commas and then add the dots before the last two characters from the variables before converting into float.如果可能的话,我想删除所有点和逗号,然后在转换为 float 之前在变量的最后两个字符之前添加点。
Input:输入:
df['min'].head()
9.50
10.00
3.45
1.095.50
13.25
Expected output:预计 output:
9.50
10.00
3.45
1095.50
13.25
Try this:尝试这个:
df['min'] = df['min'].str.replace(',', '')
df['min'] = df['min'].str[:-2] + '.' + df['min'].str[-2:]
df['min']= df['min'].astype(float)
If you always have 2 decimal digits:如果你总是有 2 个十进制数字:
df['min'] = pd.to_numeric(df['min'].str.replace('.', '', regex=False)).div(100)
output (as new column min2 for clarity): output(为清楚起见作为新列 min2):
min min2
0 9.50 9.50
1 10.00 10.00
2 3.45 3.45
3 1.095.50 1095.50
4 13.25 13.25
I have some strings in a column which originally uses commas as separators from thousands and from decimals and I need to convert this string into a float
我在一列中有一些字符串,最初使用逗号作为千位和小数的分隔符,我需要将此字符串转换为浮点数
So lets produce a reproducible data source which conforms to your description:因此,让我们生成一个符合您描述的可重现数据源:
df = {'min': '0123,456,78'}
Then splits this on " ,
" into a list:然后将“
,
”上的这个拆分成一个列表:
split_str = df['min'].split(',')
Collects integer and decimal parts separately:分别收集integer和小数部分:
int_str = ''.join(split_str[:-1])
dec_str = split_str[-1]
And finally reconstructs a valid float string;最后重建一个有效的浮点字符串; and convert it to an actual float number:
并将其转换为实际的浮点数:
float_number = float(f"{int_str}.{dec_str}")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.