简体   繁体   English

不带分隔符的拆分字母数字列 pandas dataframe

[英]Split alphanumeric column without delimiter pandas dataframe

I have a column in pandas dataframe, where i need to split the column into multiple columns, the issue I am facing is there is no delimiter to the column value.我在 pandas dataframe 中有一个列,我需要将列拆分为多个列,我面临的问题是列值没有分隔符。 Here is the dataframe这是 dataframe

import pandas as pd   
data =   ['MSFT220121C00180000','MSFT220121C00185000','MSFT220121C00200000'] 
df = pd.DataFrame(data, columns = ['contract']) 
df 

output output

    contract
0   MSFT220121C00180000
1   MSFT220121C00185000
2   MSFT220121C00200000

desired output所需 output

   ticker date  type series
0   MSFT 220121 C 00180000
1   MSFT 220121 C 00185000
2   MSFT 220121 C 00200000

I tried something with regex我用正则表达式尝试了一些东西

r = re.compile("([a-zA-Z]+)([0-9]+)")
 ('MSFT', '220121')

which didn't give me the desired result.这没有给我想要的结果。

You can use series.str.extractall() with unstack() :您可以将series.str.extractall()与 unstack unstack() ) 一起使用:

m=df.contract.str.extractall('([a-zA-Z]+)([0-9]+)').unstack().sort_index(level=1,axis=1)
m.columns=['ticker','date','type','series']
print(m)

Or:或者:

import itertools
m=pd.DataFrame([[*itertools.chain.from_iterable(i)] 
               for i in df.contract.str.findall('([a-zA-Z]+)([0-9]+)')],
               columns=['ticker','date','type','series'])

  ticker    date type    series
0   MSFT  220121    C  00180000
1   MSFT  220121    C  00185000
2   MSFT  220121    C  00200000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM