Extract substring from left to a specific character for each row in a pandas dataframe?

Question

I have a dataframe that contains a collection of strings. These strings look something like this:

"oop9-hg78-op67_457y"

I need to cut everything from the underscore to the end in order to match this data with another set. My attempt looked something like this:

df['column'] = df['column'].str[0:'_']

I've tried toying around with .find() in this statement but nothing seems to work. Anybody have any ideas? Any and all help would be greatly appreciated!

Answer 1

You can try .str.split then access the list with .str or with .str.extract

df['column'] = df['column'].str.split('_').str[0]

# or

df['column'] = df['column'].str.extract('^([^_]*)_')

print(df)

           column
0  oop9-hg78-op67

Answer 2

df['column'] = df['column'].str.extract('_', expand=False)

could also be used if another option is needed.

Adding to the solution provided above by @Ynjxsjmh

Answer 3

You can use str.extract :

df['column'] = df['column'df].str.extract(r'(^[^_]+)')

Output (as separate column for clarity):

                column         column2
0  oop9-hg78-op67_457y  oop9-hg78-op67

Regex:

(       # start capturing group
^       # match start of string
[^_]+   # one or more non-underscore
)       # end capturing group

Extract substring from left to a specific character for each row in a pandas dataframe?

Question

3 answers

solution1
1 ACCPTED 2022-05-20 18:38:35

solution2
0 2022-05-20 18:41:45

solution3
0 2022-05-20 18:43:58

Extract substring from left to a specific character for each row in a pandas dataframe?

Question

3 answers

solution1 1 ACCPTED 2022-05-20 18:38:35

solution2 0 2022-05-20 18:41:45

solution3 0 2022-05-20 18:43:58

solution1
1 ACCPTED 2022-05-20 18:38:35

solution2
0 2022-05-20 18:41:45

solution3
0 2022-05-20 18:43:58