[英]Pandas : Replace string column values
I have got a pandas dataframe with a cost column that I am attempting to format.我有一个带有成本列的 Pandas 数据框,我正在尝试对其进行格式化。 Basically, replacing the string and standardizing it as cost value is pulled from different sources.基本上,替换字符串并将其标准化为从不同来源提取的成本值。 There are also some 'NaN' .还有一些'NaN' 。
Here's some sample data:以下是一些示例数据:
$2.75
nan
4.150000
25.00
$4.50
I have the following code that I am using to standardize the format of values in the column.我有以下代码用于标准化列中值的格式。
for i in range(len(EmpComm['Cost(USD)'])):
if (pd.isnull(EmpComm['Cost(USD)'][i])):
print(EmpComm['Cost(USD)'][i], i)
#EmpComm['Cost(USD)'] = EmpComm['Cost(USD)'].iloc[i].fillna(0, inplace=True)
if type(EmpComm['Cost(USD)'].iloc[i]) == str:
#print('string', i)
EmpComm['Cost(USD)'] = EmpComm['Cost(USD)'].iloc[i].replace('$','')
Output:输出:
0 2.75
1 2.75
2 2.75
3 2.75
4 2.75
5 2.75
All values are placed with 2.75.所有值都以 2.75 放置。 It is running the second if statement for all column values as they're formatted as a string.它正在为所有列值运行第二个 if 语句,因为它们被格式化为字符串。
My question is: How would you format it?我的问题是:你会如何格式化它?
In general, you should avoid manual for
loops and use vectorised functionality, where possible, with Pandas.在一般情况下,你应该避免手工for
循环和使用矢量化功能,在可能情况下,与大熊猫。 Here you can utilise pd.to_numeric
to test and convert values within your series:在这里,您可以使用pd.to_numeric
来测试和转换系列中的值:
s = pd.Series(['$2.75', np.nan, 4.150000, 25.00, '$4.50'])
strs = s.astype(str).str.replace('$', '', regex=False)
res = pd.to_numeric(strs, errors='coerce').fillna(0)
print(res)
0 2.75
1 0.00
2 4.15
3 25.00
4 4.50
dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.