简体   繁体   English

pandas:如何从 DataFrame 的列中删除字符串?

[英]pandas: How to remove strings from columns in DataFrame?

I have data in the following format.我有以下格式的数据。 I want to remove strings that are appearing in 'TIMES_Sold' column and replace it with 0 and then convert the column to integers.我想删除出现在'TIMES_Sold'列中的字符串并将其替换为 0,然后将该列转换为整数。

Similarly, remove 'each' from the 'ITEM_Price_£' column and convert it to float.同样,从'ITEM_Price_£'列中删除 'each' 并将其转换为浮点数。 How can I be able to do that?我怎么能做到这一点?

TIMES_Sold  ITEM_Price_£
13            14.99
0             6.95 each
0             10.95 each
56            8.75
0             8.50 each
979           3.25
0             20.08
4             8.82
 Portable Gas Sniffer         9
2             15.46

Output should look like this:输出应如下所示:

TIMES_Sold  ITEM_Price_£
13            14.99
0             6.95 
0             10.95
56            8.75
0             8.50
979           3.25
0             20.08
4             8.82
0             9
2             15.46

this is one way to do it这是一种方法

df['TIMES_Sold'] = df['TIMES_Sold'].str.strip().replace(r'\D\s','0', regex=True).astype(int)
df['ITEM_Price_£'] = df['ITEM_Price_£'].astype(str).str.extract(r'([0-9\.\,]*)')
df

    TIMES_Sold  ITEM_Price_£
0        13      14.99
1        0        6.95
2        0       10.95
3        56       8.75
4        0        8.50
5        979      3.25
6        0       20.08
7        4        8.82
8        0        9
9        2       15.46
df[["TIMES_Sold", "ITEM_Price_£"]] = df[["TIMES_Sold", "ITEM_Price_£"]].astype(str).apply(lambda col: pd.to_numeric(col.str.replace(r"[^\d\.]+", "0", regex=True)))

df.TIMES_Sold = df.TIMES_Sold.astype(int)

You can use Series.str.replace with the following regex patterns您可以将Series.str.replace与以下正则表达式模式一起使用

df['TIMES_Sold'] = df['TIMES_Sold'].str.replace('\D', '0', regex=True).astype(int)
df['ITEM_Price_£'] = df['ITEM_Price_£'].str.replace('[^\d.]+', '', regex=True).astype(float)

Output输出

>>> df

   TIMES_Sold  ITEM_Price_£
0          13         14.99
1           0          6.95
2           0         10.95
3          56          8.75
4           0          8.50
5         979          3.25
6           0         20.08
7           4          8.82
8           0          9.00
9           2         15.46
  • \D - Matches anything other than a digit; \D - 匹配除数字以外的任何内容;
  • [^\d.]+ - Matches anything other than a digit or the literal . [^\d.]+ - 匹配除数字或文字以外的任何内容. as many times as possible (although the + is optional in this case)尽可能多次(尽管在这种情况下+是可选的)

How about this?这个怎么样? Good luck.祝你好运。 BTW, your output is integer wheareas it says float in your question.顺便说一句,你的输出是整数,因为它在你的问题中说浮动。

import string
alphabets=[string.ascii_lowercase+string.ascii_uppercase]
#print(alphabets)
df["TIMES_Sold"]=(df["TIMES_Sold"].where(~df["TIMES_Sold"].str.contains('[alphabets]'),'0')).astype(float)
df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM