[英]pandas: How to remove strings from columns in DataFrame?
I have data in the following format.我有以下格式的数据。 I want to remove strings that are appearing in 'TIMES_Sold'
column and replace it with 0 and then convert the column to integers.我想删除出现在'TIMES_Sold'
列中的字符串并将其替换为 0,然后将该列转换为整数。
Similarly, remove 'each' from the 'ITEM_Price_£'
column and convert it to float.同样,从'ITEM_Price_£'
列中删除 'each' 并将其转换为浮点数。 How can I be able to do that?我怎么能做到这一点?
TIMES_Sold ITEM_Price_£
13 14.99
0 6.95 each
0 10.95 each
56 8.75
0 8.50 each
979 3.25
0 20.08
4 8.82
Portable Gas Sniffer 9
2 15.46
Output should look like this:输出应如下所示:
TIMES_Sold ITEM_Price_£
13 14.99
0 6.95
0 10.95
56 8.75
0 8.50
979 3.25
0 20.08
4 8.82
0 9
2 15.46
this is one way to do it这是一种方法
df['TIMES_Sold'] = df['TIMES_Sold'].str.strip().replace(r'\D\s','0', regex=True).astype(int)
df['ITEM_Price_£'] = df['ITEM_Price_£'].astype(str).str.extract(r'([0-9\.\,]*)')
df
TIMES_Sold ITEM_Price_£
0 13 14.99
1 0 6.95
2 0 10.95
3 56 8.75
4 0 8.50
5 979 3.25
6 0 20.08
7 4 8.82
8 0 9
9 2 15.46
df[["TIMES_Sold", "ITEM_Price_£"]] = df[["TIMES_Sold", "ITEM_Price_£"]].astype(str).apply(lambda col: pd.to_numeric(col.str.replace(r"[^\d\.]+", "0", regex=True)))
df.TIMES_Sold = df.TIMES_Sold.astype(int)
You can use Series.str.replace
with the following regex patterns您可以将Series.str.replace
与以下正则表达式模式一起使用
df['TIMES_Sold'] = df['TIMES_Sold'].str.replace('\D', '0', regex=True).astype(int)
df['ITEM_Price_£'] = df['ITEM_Price_£'].str.replace('[^\d.]+', '', regex=True).astype(float)
Output输出
>>> df
TIMES_Sold ITEM_Price_£
0 13 14.99
1 0 6.95
2 0 10.95
3 56 8.75
4 0 8.50
5 979 3.25
6 0 20.08
7 4 8.82
8 0 9.00
9 2 15.46
\D
- Matches anything other than a digit; \D
- 匹配除数字以外的任何内容;[^\d.]+
- Matches anything other than a digit or the literal .
[^\d.]+
- 匹配除数字或文字以外的任何内容.
as many times as possible (although the +
is optional in this case)尽可能多次(尽管在这种情况下+
是可选的)How about this?这个怎么样? Good luck.祝你好运。 BTW, your output is integer wheareas it says float in your question.顺便说一句,你的输出是整数,因为它在你的问题中说浮动。
import string
alphabets=[string.ascii_lowercase+string.ascii_uppercase]
#print(alphabets)
df["TIMES_Sold"]=(df["TIMES_Sold"].where(~df["TIMES_Sold"].str.contains('[alphabets]'),'0')).astype(float)
df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.