简体   繁体   中英

Pandas - Extract value from column in data frame

I have a pandas data frame with really long text in a column. I wanted to select all columns that contain ABC. I was able to do this using the following

 df[df['Column'].str.contains('ABC', na=False)]

What I want to do after that is extract all values from this field that contain the prefix and the next 5 letters. S.So after finding a column, I would want to get ABC1234 or ABC7899.

I hope this makes sense.

You can use str.extract with a regular expression that says to capture any time it sees ABC with 5 following digits

df = pd.DataFrame({'Column':['ABC12345 is in this column', 'Not in this one CCD11111','Also in this one ABC99882']})
df['capture'] = df.Column.str.extract('(ABC\d{5})')
df.dropna(inplace=True)
print(df)

Output

                      Column   capture
0  ABC12345 is in this column  ABC12345
2   Also in this one ABC99882  ABC99882

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM