简体   繁体   中英

Append any further columns to the first three columns AND indicate the triple column it comes from

This is a follow-up question to Append any further columns to the first three columns .

I start out with about 120 columns. It is always three columns that belong to each other. Instead of being 120 columns side by side, they should be stacked on top of each other, so we end up with three columns. This has already been solved (see link above).

Sample data:

df = pd.DataFrame({
    "1": np.random.randint(900000000, 999999999, size=5),
    "2": np.random.choice( ["A","B","C", np.nan], 5),
    "3": np.random.choice( [np.nan, 1], 5),

    "4": np.random.randint(900000000, 999999999, size=5),
    "5": np.random.choice( ["A","B","C", np.nan], 5),
    "6": np.random.choice( [np.nan, 1], 5)
})

Working solution for initial question as suggested by Jezrael:

arr = np.arange(len(df.columns))
df.columns = [arr // 3, arr % 3]

df = df.stack(0).sort_index(level=[1, 0]).reset_index(drop=True)
df.columns = ['A','B','C']

This transforms this:

           1    2    3          4  5    6
0  960189042    B  NaN  991581392  A  1.0
1  977655199  nan  1.0  964195250  A  1.0
2  961771966    A  NaN  969007327  B  1.0
3  955308022    C  1.0  973316485  A  NaN
4  933277976    A  1.0  976749175  A  NaN

to this:

           A    B    C
0  960189042    B  NaN
1  977655199  nan  1.0
2  961771966    A  NaN
3  955308022    C  1.0
4  933277976    A  1.0
5  991581392    A  1.0
6  964195250    A  1.0
7  969007327    B  1.0
8  973316485    A  NaN
9  976749175    A  NaN

Follow Up Question: Now, if I'd need an indicator from which triple each block comes from, how could this be done? So a result could look like:

           A    B    C D
0  960189042    B  NaN 0
1  977655199  nan  1.0 0
2  961771966    A  NaN 0
3  955308022    C  1.0 0
4  933277976    A  1.0 0
5  991581392    A  1.0 1
6  964195250    A  1.0 1
7  969007327    B  1.0 1
8  973316485    A  NaN 1
9  976749175    A  NaN 1

These blocks can be of different lengths! So I cannot simply add a counter.

Use reset_index for remove only first level, second level of MultiIndex convert to column:

arr = np.arange(len(df.columns))
df.columns = [arr // 3, arr % 3]

df = df.stack(0).sort_index(level=[1, 0]).reset_index(level=0, drop=True).reset_index()
df.columns = ['D','A','B','C']
print (df)
   D          A    B    C
0  0  960189042    B  NaN
1  0  977655199  nan  1.0
2  0  961771966    A  NaN
3  0  955308022    C  1.0
4  0  933277976    A  1.0
5  1  991581392    A  1.0
6  1  964195250    A  1.0
7  1  969007327    B  1.0
8  1  973316485    A  NaN
9  1  976749175    A  NaN

Then if need change order of columns:

cols = df.columns[1:].tolist() + df.columns[:1].tolist()
df = df[cols]
print (df)
           A    B    C  D
0  960189042    B  NaN  0
1  977655199  nan  1.0  0
2  961771966    A  NaN  0
3  955308022    C  1.0  0
4  933277976    A  1.0  0
5  991581392    A  1.0  1
6  964195250    A  1.0  1
7  969007327    B  1.0  1
8  973316485    A  NaN  1
9  976749175    A  NaN  1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM