简体   繁体   中英

pandas: How to stack my data correctly?

I have a dataframe that when it's initially loaded with a list of lists looks like this:

              0       1       2  3       4       5       6       7       8   \
0        Segment  Nov-12  Dec-12     Jan-13  Feb-13  Mar-13  Apr-13  May-13   
1           A                        N/A     N/A     N/A     N/A     N/A   
2           B                        N/A     N/A     N/A     N/A     N/A   
3           C                        N/A     N/A     N/A     N/A     N/A   
4           D                        N/A     N/A     N/A     N/A     N/A   
5           Total                    N/A     N/A     N/A     N/A     N/A   

The values under each month will be float values. I want to pivot the dataframe so I end up with something like:

  Segment Month Value
0 A       month value
1 A       month value
2 B       month value
3 B       month value
etc...

What would be the best way to do this?

v = df.values[1:, 1:].astype(float)

mux = pd.MultiIndex.from_product(
    [df.iloc[1:, 0], df.iloc[0, 1:]],
    names=['Segment', 'Month']
)

d1 = pd.Series(v.ravel(), mux).reset_index(name='Value')
print(d1)

   Segment   Month  Value
0        A  Nov-12    NaN
1        A  Dec-12    NaN
2        A  Jan-13    NaN
3        A  Feb-13    NaN
4        A  Mar-13    NaN
5        A  Apr-13    NaN
6        A  May-13    NaN
7        B  Nov-12    NaN
8        B  Dec-12    NaN
9        B  Jan-13    NaN
10       B  Feb-13    NaN
11       B  Mar-13    NaN
12       B  Apr-13    NaN
13       B  May-13    NaN
14       C  Nov-12    NaN
15       C  Dec-12    NaN
16       C  Jan-13    NaN
17       C  Feb-13    NaN
18       C  Mar-13    NaN
19       C  Apr-13    NaN
20       C  May-13    NaN
21       D  Nov-12    NaN
22       D  Dec-12    NaN
23       D  Jan-13    NaN
24       D  Feb-13    NaN
25       D  Mar-13    NaN
26       D  Apr-13    NaN
27       D  May-13    NaN
28   Total  Nov-12    NaN
29   Total  Dec-12    NaN
30   Total  Jan-13    NaN
31   Total  Feb-13    NaN
32   Total  Mar-13    NaN
33   Total  Apr-13    NaN
34   Total  May-13    NaN

Explanation

# Your data obviously has an index in the first column
# and column headers in the first row
# I grab the underlyting `numpy` array
# from the 2nd column and 2nd row onward
# and convert to float
v = df.values[1:, 1:].astype(float)

# I'm going to create a `pd.MultiIndex` to enable me
# to unstack the `pd.Series` I'll create
# the first level of the index will be that first column
# that was obviously the index
# the second level will be the first row that was
# obviously the column headers
# the trick here is that I use `from_product`
# which gives me every combination of those arrays
# `ravel` unwinds or flattens the matrix and now
# lines up with this `pd.MultiIndex` that has every combination
# of row and column labels
mux = pd.MultiIndex.from_product(
    [df.iloc[1:, 0], df.iloc[0, 1:]],
    names=['Segment', 'Month']
)

# I construct the `pd.Series` and `unstack` to make the matrix
# `reset_index` takes those levels of the index and pushes them out
# the the dataframe data part.  `name='Value'` just makes sure the 
# values of the series get a column name
d1 = pd.Series(v.ravel(), mux).reset_index(name='Value')
print(d1)

I ended up finding a solution, but please let me know how I can improve it.

        cac_df = pd.DataFrame(data=vals)
        cac_df.rename(index=cac_df[0], inplace=True)
        del cac_df[0]
        cac_df = cac_df.rename(columns=cac_df.loc['Segment']).drop('Segment')
        cac_df = cac_df.applymap(lambda x: None if not x or x == 'N/A' else x)
        cac_df = pd.DataFrame(
            cac_df.dropna(axis=1, how='all').stack()
        )

The stack threw me for a loop since it returned a Series instead of a DataFrame , which is noted in the docs if you only have a single level of column hierarchy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM