简体   繁体   中英

pandas: How to remove strings from columns in DataFrame?

I have data in the following format. I want to remove strings that are appearing in 'TIMES_Sold' column and replace it with 0 and then convert the column to integers.

Similarly, remove 'each' from the 'ITEM_Price_£' column and convert it to float. How can I be able to do that?

TIMES_Sold  ITEM_Price_£
13            14.99
0             6.95 each
0             10.95 each
56            8.75
0             8.50 each
979           3.25
0             20.08
4             8.82
 Portable Gas Sniffer         9
2             15.46

Output should look like this:

TIMES_Sold  ITEM_Price_£
13            14.99
0             6.95 
0             10.95
56            8.75
0             8.50
979           3.25
0             20.08
4             8.82
0             9
2             15.46

this is one way to do it

df['TIMES_Sold'] = df['TIMES_Sold'].str.strip().replace(r'\D\s','0', regex=True).astype(int)
df['ITEM_Price_£'] = df['ITEM_Price_£'].astype(str).str.extract(r'([0-9\.\,]*)')
df

    TIMES_Sold  ITEM_Price_£
0        13      14.99
1        0        6.95
2        0       10.95
3        56       8.75
4        0        8.50
5        979      3.25
6        0       20.08
7        4        8.82
8        0        9
9        2       15.46
df[["TIMES_Sold", "ITEM_Price_£"]] = df[["TIMES_Sold", "ITEM_Price_£"]].astype(str).apply(lambda col: pd.to_numeric(col.str.replace(r"[^\d\.]+", "0", regex=True)))

df.TIMES_Sold = df.TIMES_Sold.astype(int)

You can use Series.str.replace with the following regex patterns

df['TIMES_Sold'] = df['TIMES_Sold'].str.replace('\D', '0', regex=True).astype(int)
df['ITEM_Price_£'] = df['ITEM_Price_£'].str.replace('[^\d.]+', '', regex=True).astype(float)

Output

>>> df

   TIMES_Sold  ITEM_Price_£
0          13         14.99
1           0          6.95
2           0         10.95
3          56          8.75
4           0          8.50
5         979          3.25
6           0         20.08
7           4          8.82
8           0          9.00
9           2         15.46
  • \D - Matches anything other than a digit;
  • [^\d.]+ - Matches anything other than a digit or the literal . as many times as possible (although the + is optional in this case)

How about this? Good luck. BTW, your output is integer wheareas it says float in your question.

import string
alphabets=[string.ascii_lowercase+string.ascii_uppercase]
#print(alphabets)
df["TIMES_Sold"]=(df["TIMES_Sold"].where(~df["TIMES_Sold"].str.contains('[alphabets]'),'0')).astype(float)
df

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM