简体   繁体   English

Python 重新格式化 Pandas 数据框

[英]Python reformatting Pandas dataframe

I do have the following DataFrame我确实有以下数据DataFrame

   draw_date midday_daily evening_daily midday_win_4 evening_win_4
0 2020-10-05          582           577         5490          4958
1 2020-10-06          318           176         2137          9956

Which am trying to convert it into the following Shape :我试图将其转换为以下Shape

    draw_date   draw_period     winning_numbers                wn_01    wn_02   wn_03   wn_04   wn_sum
0   2020-10-05  Midday                5 4 9 0                     5        4       9       0    18
1   2020-10-05  Evening               4 9 5 8                     4        9       5       8    26
2   2020-10-06  Midday                2 1 3 7                     2        1       3       7    13
3   2020-10-06  Evening               9 9 5 6                     9        9       5       6    29

That's what I've achieved yet:这就是我已经取得的成就:

import pandas as pd


df = pd.DataFrame.from_dict({'draw_date': {0: ('2020-10-05 00:00:00'), 1: ('2020-10-06 00:00:00')}, 'midday_daily': {0: '582', 1: '318'},
                             'evening_daily': {0: '577', 1: '176'}, 'midday_win_4': {0: '5490', 1: '2137'}, 'evening_win_4': {0: '4958', 1: '9956'}})

df.drop(df.columns[1:3], axis=1, inplace=True)
df['draw_date'] = pd.to_datetime(df['draw_date'])


print(df)

Output:输出:

   draw_date midday_win_4 evening_win_4
0 2020-10-05         5490          4958
1 2020-10-06         2137          9956

A little bit more verbose/descript approach更详细/描述性的方法

def split_numbers(df, column, prefix=None):
    split_col = df[column].astype(str).map(list)
    out = pd.DataFrame(split_col.tolist()).astype(int)
    out.columns += 1
    return df.join(out.add_prefix(prefix))

(df.filter(regex=r"(?:draw_date|win)")                   # Select the draw_date and "win" columns
 .rename(columns=lambda col: col.replace("_win_4", ""))  # Remove suffix "_win_4"
 .melt(                                                  # Reshape the data
     id_vars="draw_date", 
     var_name="draw_period", 
     value_name="winning_numbers")
 .pipe(split_numbers, "winning_numbers", prefix="wn_0")   # Extract out the winning numbers and assign back to df
 .assign(                                                # Create a sum column
     wn_sum=lambda df: df.filter(like="wn").sum(axis=1))                 
 .sort_values(                                           # sort by draw_date and draw_period to line up with OP
     ["draw_date", "draw_period"], 
     ascending=[True, False])              
)

outputs:输出:

    draw_date draw_period  winning_numbers  wn_01  wn_02  wn_03  wn_04  wn_sum
0  2020-10-05      midday             5490      5      4      9      0      18
2  2020-10-05     evening             4958      4      9      5      8      26
1  2020-10-06      midday             2137      2      1      3      7      13
3  2020-10-06     evening             9956      9      9      5      6      29
# set index and stack
stack = df.set_index('draw_date').stack()
# map list to your stacked series and create a new frame
new_df = pd.DataFrame(list(map(list, stack)), index=stack.index)
# sum the rows column-wise
new_df['sum'] = new_df.astype(int).sum(1)
# add the winning numbers back
new_df['winning numbers'] = stack
print(new_df)


                          0  1  2  3  sum winning numbers
draw_date                                                
2020-10-05 midday_win_4   5  4  9  0   18            5490
           evening_win_4  4  9  5  8   26            4958
2020-10-06 midday_win_4   2  1  3  7   13            2137
           evening_win_4  9  9  5  6   29            9956

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM