I have a dataframe that contains a collection of strings. These strings look something like this:
"oop9-hg78-op67_457y"
I need to cut everything from the underscore to the end in order to match this data with another set. My attempt looked something like this:
df['column'] = df['column'].str[0:'_']
I've tried toying around with .find() in this statement but nothing seems to work. Anybody have any ideas? Any and all help would be greatly appreciated!
You can try .str.split
then access the list with .str
or with .str.extract
df['column'] = df['column'].str.split('_').str[0]
# or
df['column'] = df['column'].str.extract('^([^_]*)_')
print(df)
column
0 oop9-hg78-op67
df['column'] = df['column'].str.extract('_', expand=False)
could also be used if another option is needed.
Adding to the solution provided above by @Ynjxsjmh
You can use str.extract
:
df['column'] = df['column'df].str.extract(r'(^[^_]+)')
Output (as separate column for clarity):
column column2
0 oop9-hg78-op67_457y oop9-hg78-op67
Regex:
( # start capturing group
^ # match start of string
[^_]+ # one or more non-underscore
) # end capturing group
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.