[英]Split alphanumeric column without delimiter pandas dataframe
I have a column in pandas dataframe, where i need to split the column into multiple columns, the issue I am facing is there is no delimiter to the column value.我在 pandas dataframe 中有一个列,我需要将列拆分为多个列,我面临的问题是列值没有分隔符。 Here is the dataframe这是 dataframe
import pandas as pd
data = ['MSFT220121C00180000','MSFT220121C00185000','MSFT220121C00200000']
df = pd.DataFrame(data, columns = ['contract'])
df
output output
contract
0 MSFT220121C00180000
1 MSFT220121C00185000
2 MSFT220121C00200000
desired output所需 output
ticker date type series
0 MSFT 220121 C 00180000
1 MSFT 220121 C 00185000
2 MSFT 220121 C 00200000
I tried something with regex我用正则表达式尝试了一些东西
r = re.compile("([a-zA-Z]+)([0-9]+)")
('MSFT', '220121')
which didn't give me the desired result.这没有给我想要的结果。
You can use series.str.extractall()
with unstack()
:您可以将series.str.extractall()
与 unstack unstack()
) 一起使用:
m=df.contract.str.extractall('([a-zA-Z]+)([0-9]+)').unstack().sort_index(level=1,axis=1)
m.columns=['ticker','date','type','series']
print(m)
Or:或者:
import itertools
m=pd.DataFrame([[*itertools.chain.from_iterable(i)]
for i in df.contract.str.findall('([a-zA-Z]+)([0-9]+)')],
columns=['ticker','date','type','series'])
ticker date type series
0 MSFT 220121 C 00180000
1 MSFT 220121 C 00185000
2 MSFT 220121 C 00200000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.