简体   繁体   中英

How to remove numbers from a string column that starts with 4 zeros?

I have a column of names and informations of products, i need to remove the codes from the names and every code starts with four or more zeros, some names have four zeros or more in the weight and some are joined with the name as the example below:

data = {
    'Name' : ['ANOA 250g 00004689', 'ANOA 10000g 00000059884', '80%c asjw 150000001568 ', 'Shivangi000000478761'],
}
  
testdf = pd.DataFrame(data)

The correct output would be:

results = {
        'Name' : ['ANOA 250g', 'ANOA 10000g', '80%c asjw 150000001568 ', 'Shivangi'],
    }
      
    results = pd.DataFrame(results)

Use a regex with str.replace :

testdf['Name'] = testdf['Name'].str.replace(r'(?:(?<=\D)|\s*\b)0{4}\d*',
                                            '', regex=True)

Or, similar to @HaleemurAli, with a negative match

testdf['Name'] = testdf['Name'].str.replace(r'(?<!\d)0{4,}0{4}\d*',
                                            '', regex=True)

Output:

                      Name
0                ANOA 250g
1              ANOA 10000g
2  80%c asjw 150000001568 
3                 Shivangi

regex1 demo

regex2 demo

you can split the strings by the start of the code pattern, which is expressed by the regex (?<,\d)0{4,} . this pattern consumes four 0 s that are not preceded by any digit. after splitting the string, take the first fragment, and the str.strip gets rid of possible trailing space

testdf.Name.str.split('(?<!\d)0{4,}', regex=True, expand=True)[0].str.strip()[0].str.strip()
# outputs:
0                 ANOA 250g
1               ANOA 10000g
2    80%c asjw 150000001568
3                  Shivangi

note that this works for the case where the codes are always at the end of your string.

try splitting it at each space and checking if the each item has 0000 in it like:

answer=[]
for i in results["Name"]:
    answer.append("".join([j for j in i.split() if "0000" not in j]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM