简体   繁体   English

从数据框列中的字符串值中删除字符

[英]removing character from string value in dataframe column

I hope you can help me with this question.我希望你能帮助我解决这个问题。 I have a column with numeric values as strings.我有一个数值作为字符串的列。 Since they are data from diferent countries, some of them have different formats such as "," and "$".由于它们是来自不同国家的数据,因此其中一些具有不同的格式,例如“,”和“$”。 I'm trying to convert the serie to numbers, but i'm having trouble with "," and "$" values.我正在尝试将系列转换为数字,但我在使用“,”和“$”值时遇到问题。

data={"valores":[1,1,3,"4","5.00","1,000","$5,700"]}
df=pd.DataFrame(data)
df

    valores
0   1
1   1
2   3
3   4
4   5.00
5   1,000
6   $5,700

Ive tried the following:我试过以下:

df["valores"].replace(",","")

but it does not change a thing since the "," value is in the string, not the string value itself但它不会改变任何事情,因为“,”值在字符串中,而不是字符串值本身

pd.to_numeric(df["valores"])

But I receive the "ValueError: Unable to parse string "1,000" at position 5" error.但我收到“ValueError: Unable to parse string "1,000" at position 5”错误。

valores=[i.replace(",","") for i in df["valores"].values]

But I receive the "AttributeError: 'int' object has no attribute 'replace' error.但我收到“AttributeError: 'int' 对象没有属性 'replace' 错误。

So, at last, I tried with this:所以,最后,我尝试了这个:

valores=[i.replace(",","") for i in df["valores"].values if type(i)==str]
valores
['4', '5.00', '1000', '$5700']

But it skipped the first three values since they are not strings..但它跳过了前三个值,因为它们不是字符串。

I think that with a Regex code i would be able to manage it, but I just simply dont understand how to work with it.我认为使用正则表达式代码我将能够管理它,但我只是不明白如何使用它。

I hope you can help me since i've been struggling with this for about 7 hours.我希望你能帮助我,因为我已经为此苦苦挣扎了大约 7 个小时。

你可以试试这个:

df['valores'] = df['valores'].replace(to_replace='[\,\$]',value='',regex=True).astype(float)

你应该首先从它创建一个字符串,所以像这样

valores=[str(i).replace(",","") for i in df["valores"].values]

.replace by default searches for the whole cell values . .replace默认搜索整个单元格值 Since you want to replace a part of the string, you need .str.replace or replace(...,regex=True) :由于要替换字符串的一部分,因此需要.str.replacereplace(...,regex=True)

df['valores'] = df["valores"].replace(",","", regex=True)

Or:或者:

df['valore'] = df["valores"].str.replace(",","")

You need to cast the values in the valores column to string using .astype(str) , then remove all $ and , using .str.replace('[,$]', '') and then you may convert all data to numeric using pd.to_numeric :您需要使用.astype(str)valores列中的值valores为字符串,然后使用.str.replace('[,$]', '')删除所有$,然后您可以将所有数据转换为数字使用pd.to_numeric

>>> pd.to_numeric(df["valores"].astype(str).str.replace("[,$]",""))
0       1.0
1       1.0
2       3.0
3       4.0
4       5.0
5    1000.0
6    5700.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM