简体   繁体   中英

pandas: How to remove strings from columns in DataFrame?

I have data in the following format. I want to remove strings that are appearing in 'TIMES_Sold' column and replace it with 0 and then convert the column to integers.

Similarly, remove 'each' from the 'ITEM_Price_£' column and convert it to float. How can I be able to do that?

TIMES_Sold  ITEM_Price_£
13            14.99
0             6.95 each
0             10.95 each
56            8.75
0             8.50 each
979           3.25
0             20.08
4             8.82
 Portable Gas Sniffer         9
2             15.46

Output should look like this:

TIMES_Sold  ITEM_Price_£
13            14.99
0             6.95 
0             10.95
56            8.75
0             8.50
979           3.25
0             20.08
4             8.82
0             9
2             15.46

this is one way to do it

df['TIMES_Sold'] = df['TIMES_Sold'].str.strip().replace(r'\D\s','0', regex=True).astype(int)
df['ITEM_Price_£'] = df['ITEM_Price_£'].astype(str).str.extract(r'([0-9\.\,]*)')

    TIMES_Sold  ITEM_Price_£
0        13      14.99
1        0        6.95
2        0       10.95
3        56       8.75
4        0        8.50
5        979      3.25
6        0       20.08
7        4        8.82
8        0        9
9        2       15.46
df[["TIMES_Sold", "ITEM_Price_£"]] = df[["TIMES_Sold", "ITEM_Price_£"]].astype(str).apply(lambda col: pd.to_numeric(col.str.replace(r"[^\d\.]+", "0", regex=True)))

df.TIMES_Sold = df.TIMES_Sold.astype(int)

You can use Series.str.replace with the following regex patterns

df['TIMES_Sold'] = df['TIMES_Sold'].str.replace('\D', '0', regex=True).astype(int)
df['ITEM_Price_£'] = df['ITEM_Price_£'].str.replace('[^\d.]+', '', regex=True).astype(float)


>>> df

   TIMES_Sold  ITEM_Price_£
0          13         14.99
1           0          6.95
2           0         10.95
3          56          8.75
4           0          8.50
5         979          3.25
6           0         20.08
7           4          8.82
8           0          9.00
9           2         15.46
  • \D - Matches anything other than a digit;
  • [^\d.]+ - Matches anything other than a digit or the literal . as many times as possible (although the + is optional in this case)

How about this? Good luck. BTW, your output is integer wheareas it says float in your question.

import string

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM