[英]Partial Multiindexing with a Pandas DataFrame
I have a dataframe as follows: 我有一个数据框,如下所示:
df = pd.DataFrame(columns=['New Category', 'Sample1', 'Sample2'],
data=[
['Pathogenic/Likely Pathogenic', '0/0:240', '1/0:100'],
['Likely Benign', '1/1:0,237', '1/0:700'],
['Likely Benign', '0/0:239', '0/0:234'],
['Likely Benign', '1/1:1,238', '0/1:890'],
['Likely Benign', '0/1:156,79', '1/1:767'],
['VUS', '1/1:0,241', '0/1:21']
])
Which looks like this: 看起来像这样:
New Category Sample1 Sample2
0 Pathogenic/Likely Pathogenic 0/0:240 1/0:100
1 Likely Benign 1/1:237 1/0:700
2 Likely Benign 0/0:239 0/0:234
3 Likely Benign 1/1:238 0/1:890
4 Likely Benign 0/1:156 1/1:767
5 VUS 1/1:241 0/1:21
I want to do some multiindexing so that the Sample1 and Sample2 values are split by the colon and placed underneath as a sub-column name. 我想做一些多索引操作,以使Sample1和Sample2值被冒号分开并作为子列名称放在下面。 However, I do not want these sub-column names to apply to the New Category column. 但是,我不希望这些子列名称适用于“新类别”列。 Basically I want it to look like this: 基本上我希望它看起来像这样:
New Category Sample1 Sample2
GT GQ GT GQ
0 Pathogenic/Likely Pathogenic 0/0 240 1/0 100
1 Likely Benign 1/1 237 1/0 700
2 Likely Benign 0/0 239 0/0 234
3 Likely Benign 1/1 238 0/1 890
4 Likely Benign 0/1 156 1/1 767
5 VUS 1/1 241 0/1 21
I really am stumped on how to do this. 我真的对如何做到这一点感到困惑。 The multiindexing page of the pandas docs contains no example of multiindexing on selected columns only. pandas文档的multiindexing页面仅在选定列上没有包含multiindexing示例。 This is making we wonder whether this is even possible. 这使我们怀疑这是否可能。
This is not really a matter of " indexing ", but rather of manipulating data, in particular splitting the columns. 这实际上不是“ 索引 ”问题,而是操作数据,尤其是拆分列。 The following should do: 应该执行以下操作:
df_new_category = pd.DataFrame(
df[['New Category']].values,
columns=pd.MultiIndex.from_tuples([('New Category', '')])
)
sample_data_dfs = \
[pd.DataFrame(list(df[col].str.split(':')),
columns=pd.MultiIndex.from_product([[col], ['GT', 'GQ']]))
for col in ['Sample1', 'Sample2']]
pd.concat([df_new_category] + sample_data_dfs, axis=1)
Notice that you could do the splitting all at once (ie without a loop on each column), like follows: 请注意,您可以一次全部拆分(即,每列上没有循环),如下所示:
df[['Sample1', 'Sample2']].applymap(lambda s : s.split(':'))
... but ...但是
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.