简体   繁体   中英

Remove leading words pandas

I have this data df where Names is a column name and below it are its data:

Names
------
23James
0Sania
4124Thomas
101Craig
8Rick

How can I return it to this:

Names
------
James
Sania
Thomas
Craig
Rick

I tried with df.strip but there are certain numbers that are still in the DataFrame.

We can use str.replace here with the regex pattern ^\d+ , which targets leading digits.

df["Names"] = df["Names"].str.replace(r'^\d+', '')

The answer by Tim certainly solves this but I usually feel uncomfortable using regex as I'm not proficient with it so I would approach it like this -

def removeStartingNums(s):
  count = 0
  for i in s:
    if i.isnumeric():
      count += 1
    else:
      break
  return s[count:]
 
df["Names"] = df["Names"].apply(removeStartingNums)

What the function essentially does is count the number of leading characters which are numeric and then returns a string which has those starting characters sliced off

You can also extract all characters after digits using a capture group:

df['Names'] = df['Names'].str.extract('^\d+(.*)')
print(df)

# Output
    Names
0   James
1   Sania
2  Thomas
3   Craig
4    Rick

Details on Regex101

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM