简体   繁体   中英

split, map data in two columns in pandas data frame

I want to split data in two columns from a data frame and construct new columns using this data.

My data frame is,

dfc = pd.DataFrame( {"A": ["GT:DP:RO:QR:AO:QA:GL", "GT:DP:RO:QR:AO:QA:GL", "GT:DP:RO:QR:AO:QA:GL", "GT:DP:GL", "GT:DP:GL"], "B": ["0/1:71:43:1363:28:806:-71.1191,0,-121.278", "0/1:71:43:1363:28:806:-71.1191,0,-121.278", "0/1:71:43:1363:28:806:-71.1191,0,-121.278", "1/1:49:-103.754,0,-3.51307", "1/1:49:-103.754,0,-3.51307"]} )

I want individual columns named GT, DP, RO, QR, AO, QA, GL with values from column B

I want to produce output as, 在此输入图像描述

We can split the two columns using a = df.A.str.split(":", expand = True) and b = df.B.str.split(":", expand = True) to get two individual data frames. These can be merged with c = pd.merge(a, b, left_index = True, right_index = True) to get all desired data. But, not in the format as expected. 在此输入图像描述

Any suggestions ? I think better way can be using split on both columns A and B and then creating a dict column with values from A as key and B as values. Then this column can be converted to data frame. Thanks

Use an OrderedDict to preserve the order after creating a dict mapping of the two concerned columns of the dataframe split on the sep " : ", flattened to a list .

Feed this to the dataframe constructor later.

from collections import OrderedDict

L = dfc.apply(
    lambda x: OrderedDict(zip(x['A'].split(':'), x['B'].split(':'))), 1).tolist()
pd.DataFrame(L)

在此输入图像描述

  • I'm going to split everything by ':' . But I have 2 columns. If I stack first, I get a series in which I can more easily use str.split
  • I now have a split series in which I can group by level=0 which is the original index.
  • I zip and dict to get series like structures with the original column A as the indices and B as the values.
  • unstack and I'm done.

gb = dfc.stack().str.split(':').groupby(level=0)
gb.apply(lambda x: dict(zip(*x))).unstack()

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM