简体   繁体   中英

Extract substring from left to a specific character for each row in a pandas dataframe?

I have a dataframe that contains a collection of strings. These strings look something like this:

"oop9-hg78-op67_457y"

I need to cut everything from the underscore to the end in order to match this data with another set. My attempt looked something like this:

df['column'] = df['column'].str[0:'_']

I've tried toying around with .find() in this statement but nothing seems to work. Anybody have any ideas? Any and all help would be greatly appreciated!

You can try .str.split then access the list with .str or with .str.extract

df['column'] = df['column'].str.split('_').str[0]

# or

df['column'] = df['column'].str.extract('^([^_]*)_')
print(df)

           column
0  oop9-hg78-op67
df['column'] = df['column'].str.extract('_', expand=False)

could also be used if another option is needed.

Adding to the solution provided above by @Ynjxsjmh

You can use str.extract :

df['column'] = df['column'df].str.extract(r'(^[^_]+)')

Output (as separate column for clarity):

                column         column2
0  oop9-hg78-op67_457y  oop9-hg78-op67

Regex:

(       # start capturing group
^       # match start of string
[^_]+   # one or more non-underscore
)       # end capturing group

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM