I want to split data in two columns from a data frame and construct new columns using this data.
My data frame is,
dfc = pd.DataFrame( {"A": ["GT:DP:RO:QR:AO:QA:GL", "GT:DP:RO:QR:AO:QA:GL", "GT:DP:RO:QR:AO:QA:GL", "GT:DP:GL", "GT:DP:GL"], "B": ["0/1:71:43:1363:28:806:-71.1191,0,-121.278", "0/1:71:43:1363:28:806:-71.1191,0,-121.278", "0/1:71:43:1363:28:806:-71.1191,0,-121.278", "1/1:49:-103.754,0,-3.51307", "1/1:49:-103.754,0,-3.51307"]} )
I want individual columns named GT, DP, RO, QR, AO, QA, GL
with values from column B
We can split the two columns using a = df.A.str.split(":", expand = True)
and b = df.B.str.split(":", expand = True)
to get two individual data frames. These can be merged with c = pd.merge(a, b, left_index = True, right_index = True)
to get all desired data. But, not in the format as expected.
Any suggestions ? I think better way can be using split
on both columns A
and B
and then creating a dict
column with values from A
as key and B
as values. Then this column can be converted to data frame. Thanks
Use an OrderedDict
to preserve the order after creating a dict
mapping of the two concerned columns of the dataframe split on the sep " :
", flattened to a list
.
Feed this to the dataframe constructor later.
from collections import OrderedDict
L = dfc.apply(
lambda x: OrderedDict(zip(x['A'].split(':'), x['B'].split(':'))), 1).tolist()
pd.DataFrame(L)
':'
. But I have 2 columns. If I stack
first, I get a series in which I can more easily use str.split
level=0
which is the original index. zip
and dict
to get series like structures with the original column A
as the indices and B
as the values. unstack
and I'm done. gb = dfc.stack().str.split(':').groupby(level=0)
gb.apply(lambda x: dict(zip(*x))).unstack()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.