简体   繁体   中英

Pandas - How to split a string column into several columns, by the index of specific characters?

I want to extract the user ID from a string column called "filename", and create a new ID column, based on the index of specific character in the original string.

Two examples for the string in "filename", with ID of 2 or 3 digits:

  1. filename = ID 100 session1neg_emotions_rating.csv ---> ID = 100

  2. filename =ID 21 session2neu_emotions_rating.csv ---> ID = 21

I tried this -

df['ID '] = df.filename.str[2:**4**]

but I couldn't find the end index of the ID for the slice per row (it 3 or 4, depends on the length of the ID as 2 or 3 digits).

finding the index of "s" after each row in the data frame will solve my problem. The simple option didn't work for me -

s_index = df.filename.str.index("s")

(I also tried some split option, but I don't have a specific character such as comma, to split by)

Thanks a lot! sorry if it's a duplication of a previous question

I'd use regex with str.extract :

s_index = df.filename.str.extract("^ID(\d+)")

As integers:

s_index = df.filename.str.extract("^ID(\d+)").astype(int)

Regex101 explanation

An alternative to regex which is probably the best answer is to use split first on 'session' and grabbing the very first element and then another split grabbing the last element:

df['ID'] =  df.filename.str.split('session').str[0].str.split('ID').str[1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM