简体   繁体   中英

How to split a string in Pandas Data Frame to different Columns

I am trying to split a string in the pandas data frame into several other columns.

The string will be like "AARTIIND27JAN221000CE.NFO" and it should be split into

["AARTIIND", "27", "JAN", "22", "1000", "CE"]

Note: The length of the string is not standard for all the rows. So I need a solution in the regular expression on how to split this.

It can be done using pd.Series.str.extract but I don't how to do it exactly.

Thanks for the help!

The exact logic is unclear, but assuming you wan use the fact to have digits or not, and a DDMMMDD date in the middle, you can try:

df['col'].str.extract('(\D+)(\d{2})(\D{3})(\d{2})(\d+)([^.]+)')

output:

          0   1    2   3     4   5
0  AARTIIND  27  JAN  22  1000  CE

example input:

df = pd.DataFrame({'col': ["AARTIIND27JAN221000CE.NFO"]})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM