简体   繁体   English

将任何其他列附加到前三列并指出它来自的三列

[英]Append any further columns to the first three columns AND indicate the triple column it comes from

This is a follow-up question to Append any further columns to the first three columns .这是将任何其他列附加到前三列的后续问题。

I start out with about 120 columns.我从大约 120 列开始。 It is always three columns that belong to each other.它始终是属于彼此的三列。 Instead of being 120 columns side by side, they should be stacked on top of each other, so we end up with three columns.与其并排放置 120 列,不如将它们堆叠在一起,因此我们最终得到了三列。 This has already been solved (see link above).这已经解决了(见上面的链接)。

Sample data:样本数据:

df = pd.DataFrame({
    "1": np.random.randint(900000000, 999999999, size=5),
    "2": np.random.choice( ["A","B","C", np.nan], 5),
    "3": np.random.choice( [np.nan, 1], 5),

    "4": np.random.randint(900000000, 999999999, size=5),
    "5": np.random.choice( ["A","B","C", np.nan], 5),
    "6": np.random.choice( [np.nan, 1], 5)
})

Working solution for initial question as suggested by Jezrael: Jezrael 建议的初始问题的工作解决方案:

arr = np.arange(len(df.columns))
df.columns = [arr // 3, arr % 3]

df = df.stack(0).sort_index(level=[1, 0]).reset_index(drop=True)
df.columns = ['A','B','C']

This transforms this:这改变了这个:

           1    2    3          4  5    6
0  960189042    B  NaN  991581392  A  1.0
1  977655199  nan  1.0  964195250  A  1.0
2  961771966    A  NaN  969007327  B  1.0
3  955308022    C  1.0  973316485  A  NaN
4  933277976    A  1.0  976749175  A  NaN

to this:对此:

           A    B    C
0  960189042    B  NaN
1  977655199  nan  1.0
2  961771966    A  NaN
3  955308022    C  1.0
4  933277976    A  1.0
5  991581392    A  1.0
6  964195250    A  1.0
7  969007327    B  1.0
8  973316485    A  NaN
9  976749175    A  NaN

Follow Up Question: Now, if I'd need an indicator from which triple each block comes from, how could this be done?后续问题:现在,如果我需要一个指标,每个块来自哪个三元组,怎么做? So a result could look like:所以结果可能如下所示:

           A    B    C D
0  960189042    B  NaN 0
1  977655199  nan  1.0 0
2  961771966    A  NaN 0
3  955308022    C  1.0 0
4  933277976    A  1.0 0
5  991581392    A  1.0 1
6  964195250    A  1.0 1
7  969007327    B  1.0 1
8  973316485    A  NaN 1
9  976749175    A  NaN 1

These blocks can be of different lengths!这些块可以有不同的长度! So I cannot simply add a counter.所以我不能简单地添加一个计数器。

Use reset_index for remove only first level, second level of MultiIndex convert to column:使用reset_index仅删除第一级,第二级MultiIndex转换为列:

arr = np.arange(len(df.columns))
df.columns = [arr // 3, arr % 3]

df = df.stack(0).sort_index(level=[1, 0]).reset_index(level=0, drop=True).reset_index()
df.columns = ['D','A','B','C']
print (df)
   D          A    B    C
0  0  960189042    B  NaN
1  0  977655199  nan  1.0
2  0  961771966    A  NaN
3  0  955308022    C  1.0
4  0  933277976    A  1.0
5  1  991581392    A  1.0
6  1  964195250    A  1.0
7  1  969007327    B  1.0
8  1  973316485    A  NaN
9  1  976749175    A  NaN

Then if need change order of columns:然后如果需要更改列的顺序:

cols = df.columns[1:].tolist() + df.columns[:1].tolist()
df = df[cols]
print (df)
           A    B    C  D
0  960189042    B  NaN  0
1  977655199  nan  1.0  0
2  961771966    A  NaN  0
3  955308022    C  1.0  0
4  933277976    A  1.0  0
5  991581392    A  1.0  1
6  964195250    A  1.0  1
7  969007327    B  1.0  1
8  973316485    A  NaN  1
9  976749175    A  NaN  1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM