Pandas - How to split a string column into several columns, by the index of specific characters?

Question

I want to extract the user ID from a string column called "filename", and create a new ID column, based on the index of specific character in the original string.

Two examples for the string in "filename", with ID of 2 or 3 digits:

filename = ID 100 session1neg_emotions_rating.csv ---> ID = 100
filename =ID 21 session2neu_emotions_rating.csv ---> ID = 21

I tried this -

df['ID '] = df.filename.str[2:**4**]

but I couldn't find the end index of the ID for the slice per row (it 3 or 4, depends on the length of the ID as 2 or 3 digits).

finding the index of "s" after each row in the data frame will solve my problem. The simple option didn't work for me -

s_index = df.filename.str.index("s")

(I also tried some split option, but I don't have a specific character such as comma, to split by)

Thanks a lot! sorry if it's a duplication of a previous question

Answer 1

I'd use regex with str.extract :

s_index = df.filename.str.extract("^ID(\d+)")

As integers:

s_index = df.filename.str.extract("^ID(\d+)").astype(int)

Regex101 explanation

Answer 2

An alternative to regex which is probably the best answer is to use split first on 'session' and grabbing the very first element and then another split grabbing the last element:

df['ID'] =  df.filename.str.split('session').str[0].str.split('ID').str[1]

Pandas - How to split a string column into several columns, by the index of specific characters?

Question

2 answers

solution1
0 2021-12-22 12:30:53

solution2
0 2021-12-22 12:32:56

Pandas - How to split a string column into several columns, by the index of specific characters?

Question

2 answers

solution1 0 2021-12-22 12:30:53

solution2 0 2021-12-22 12:32:56

solution1
0 2021-12-22 12:30:53

solution2
0 2021-12-22 12:32:56