split, map data in two columns in pandas data frame

Question

I want to split data in two columns from a data frame and construct new columns using this data.

My data frame is,

dfc = pd.DataFrame( {"A": ["GT:DP:RO:QR:AO:QA:GL", "GT:DP:RO:QR:AO:QA:GL", "GT:DP:RO:QR:AO:QA:GL", "GT:DP:GL", "GT:DP:GL"], "B": ["0/1:71:43:1363:28:806:-71.1191,0,-121.278", "0/1:71:43:1363:28:806:-71.1191,0,-121.278", "0/1:71:43:1363:28:806:-71.1191,0,-121.278", "1/1:49:-103.754,0,-3.51307", "1/1:49:-103.754,0,-3.51307"]} )

I want individual columns named GT, DP, RO, QR, AO, QA, GL with values from column B

I want to produce output as,

We can split the two columns using a = df.A.str.split(":", expand = True) and b = df.B.str.split(":", expand = True) to get two individual data frames. These can be merged with c = pd.merge(a, b, left_index = True, right_index = True) to get all desired data. But, not in the format as expected.

Any suggestions ? I think better way can be using split on both columns A and B and then creating a dict column with values from A as key and B as values. Then this column can be converted to data frame. Thanks

Answer 1

Use an OrderedDict to preserve the order after creating a dict mapping of the two concerned columns of the dataframe split on the sep " : ", flattened to a list .

Feed this to the dataframe constructor later.

from collections import OrderedDict

L = dfc.apply(
    lambda x: OrderedDict(zip(x['A'].split(':'), x['B'].split(':'))), 1).tolist()
pd.DataFrame(L)

Answer 2

I'm going to split everything by ':' . But I have 2 columns. If I stack first, I get a series in which I can more easily use str.split
I now have a split series in which I can group by level=0 which is the original index.
I zip and dict to get series like structures with the original column A as the indices and B as the values.
unstack and I'm done.

gb = dfc.stack().str.split(':').groupby(level=0)
gb.apply(lambda x: dict(zip(*x))).unstack()

split, map data in two columns in pandas data frame

Question

2 answers

solution1
3 ACCPTED 2016-12-29 08:35:52

solution2
2 2016-12-29 08:34:28

split, map data in two columns in pandas data frame

Question

2 answers

solution1 3 ACCPTED 2016-12-29 08:35:52

solution2 2 2016-12-29 08:34:28

solution1
3 ACCPTED 2016-12-29 08:35:52

solution2
2 2016-12-29 08:34:28