pandas：如何从 DataFrame 的列中删除字符串？

Question

I have data in the following format.我有以下格式的数据。 I want to remove strings that are appearing in 'TIMES_Sold' column and replace it with 0 and then convert the column to integers.我想删除出现在'TIMES_Sold'列中的字符串并将其替换为 0，然后将该列转换为整数。

Similarly, remove 'each' from the 'ITEM_Price_£' column and convert it to float.同样，从'ITEM_Price_£'列中删除 'each' 并将其转换为浮点数。 How can I be able to do that?我怎么能做到这一点？

TIMES_Sold  ITEM_Price_£
13            14.99
0             6.95 each
0             10.95 each
56            8.75
0             8.50 each
979           3.25
0             20.08
4             8.82
 Portable Gas Sniffer         9
2             15.46

Output should look like this:输出应如下所示：

TIMES_Sold  ITEM_Price_£
13            14.99
0             6.95 
0             10.95
56            8.75
0             8.50
979           3.25
0             20.08
4             8.82
0             9
2             15.46

Answer 1

this is one way to do it这是一种方法

df['TIMES_Sold'] = df['TIMES_Sold'].str.strip().replace(r'\D\s','0', regex=True).astype(int)
df['ITEM_Price_£'] = df['ITEM_Price_£'].astype(str).str.extract(r'([0-9\.\,]*)')
df


    TIMES_Sold  ITEM_Price_£
0        13      14.99
1        0        6.95
2        0       10.95
3        56       8.75
4        0        8.50
5        979      3.25
6        0       20.08
7        4        8.82
8        0        9
9        2       15.46

Answer 2

df[["TIMES_Sold", "ITEM_Price_£"]] = df[["TIMES_Sold", "ITEM_Price_£"]].astype(str).apply(lambda col: pd.to_numeric(col.str.replace(r"[^\d\.]+", "0", regex=True)))

df.TIMES_Sold = df.TIMES_Sold.astype(int)

Answer 3

You can use Series.str.replace with the following regex patterns您可以将Series.str.replace与以下正则表达式模式一起使用

df['TIMES_Sold'] = df['TIMES_Sold'].str.replace('\D', '0', regex=True).astype(int)
df['ITEM_Price_£'] = df['ITEM_Price_£'].str.replace('[^\d.]+', '', regex=True).astype(float)

Output输出

>>> df

   TIMES_Sold  ITEM_Price_£
0          13         14.99
1           0          6.95
2           0         10.95
3          56          8.75
4           0          8.50
5         979          3.25
6           0         20.08
7           4          8.82
8           0          9.00
9           2         15.46

\D - Matches anything other than a digit; \D - 匹配除数字以外的任何内容；
[^\d.]+ - Matches anything other than a digit or the literal . [^\d.]+ - 匹配除数字或文字以外的任何内容. as many times as possible (although the + is optional in this case)尽可能多次（尽管在这种情况下+是可选的）

Answer 4

How about this?这个怎么样？ Good luck.祝你好运。 BTW, your output is integer wheareas it says float in your question.顺便说一句，你的输出是整数，因为它在你的问题中说浮动。

import string
alphabets=[string.ascii_lowercase+string.ascii_uppercase]
#print(alphabets)
df["TIMES_Sold"]=(df["TIMES_Sold"].where(~df["TIMES_Sold"].str.contains('[alphabets]'),'0')).astype(float)
df

pandas：如何从 DataFrame 的列中删除字符串？

问题描述

4 个解决方案

解决方案1
0 2022-07-02 13:44:21

解决方案2
0 2022-07-02 14:03:39

解决方案3
0 2022-07-02 14:16:55

解决方案4
0 2022-07-02 14:50:49

pandas：如何从 DataFrame 的列中删除字符串？

问题描述

4 个解决方案

解决方案1 0 2022-07-02 13:44:21

解决方案2 0 2022-07-02 14:03:39

解决方案3 0 2022-07-02 14:16:55

解决方案4 0 2022-07-02 14:50:49

解决方案1
0 2022-07-02 13:44:21

解决方案2
0 2022-07-02 14:03:39

解决方案3
0 2022-07-02 14:16:55

解决方案4
0 2022-07-02 14:50:49