简体   繁体   English

如何在给定百分比的情况下将 dataframe pandas 中的行拆分为新行

[英]how to split lines in dataframe pandas into new lines given a percentage

I have two dataframes that I would like to turn into one dataframe.我有两个数据框,我想将它们变成一个 dataframe。

dataframe 1: dataframe 1:

   level_1 level_2  ...  2021-01-30 00:00:00  2021-01-31 00:00:00
0        A       X  ...            85.191861 (*)        61.387459
1        A       Y  ...            40.931960            76.803598
2        A       Z  ...            36.811113            44.834602
3        B       X  ...            60.707172             2.457655
4        B       Y  ...            53.990172            50.636714
5        B       Z  ...            52.347966            27.938671
6        C       X  ...            17.619055            89.028475
7        C       Y  ...            91.442144            56.108239
8        C       Z  ...             8.633474            14.716926
9        D       X  ...            41.690869            68.389570
10       D       Y  ...            37.896461            11.763265
11       D       Z  ...            12.492016            70.571018

dataframe 2: dataframe 2:


   level_2  level_3  2020-02-01 00:00:00  2020-02-02 00:00:00
0        X     1060                 0.10                  0.0
1        X     1064                 0.20                  1.0
2        X     1065                 0.50                  0.0
3        X     1067                 0.00                  0.0
4        X     1068                 0.00                  0.0
5        X     1264                 0.20                  0.0
6        X     1061                 0.00                  0.0
7        X     1000                 0.00                  0.0
8        Y     1060                 0.05                  0.1
9        Y     1064                 0.05                  0.2
10       Y     1065                 0.10                  0.5
11       Y     1067                 0.20                  0.0
12       Y     1068                 0.20                  0.0
13       Y     1264                 0.10                  0.2
14       Y     1061                 0.15                  0.0
15       Y     1000                 0.15                  0.0
16       Z     1060                 0.00                  0.0
17       Z     1064                 0.00                  0.0
18       Z     1065                 0.00                  0.0
19       Z     1067                 0.90                  0.9
20       Z     1068                 0.10                  0.1
21       Z     1264                 0.00                  0.0
22       Z     1061                 0.00                  0.0
23       Z     1000                 0.00                  0.0

I would like to add a third column to the first dataframe, level 3, and split the original row across the percentage given in dataframe 2. The numbers aren't correct, but the structure would look something like this:我想在第 3 级的第一个 dataframe 中添加第三列,并将原始行拆分为 dataframe 2 中给出的百分比。数字不正确,但结构看起来像这样:

   level_1 level_2  level_3  2020-02-01 00:00:00
0        A       X     1060             2.374184
1        A       X     1064             4.748367
2        A       X     1065            11.870918
3        A       X     1067             0.000000
4        A       X     1068             0.000000
5        A       X     1264             4.748367
6        A       X     1061             0.000000
7        A       X     1000             0.000000
8        A       Y     1060             2.813819
9        A       Y     1064             2.813819
10       A       Y     1065             5.627637
11       A       Y     1067            11.255275
12       A       Y     1068            11.255275
13       A       Y     1264             5.627637
14       A       Y     1061             8.441456
15       A       Y     1000             8.441456
16       A       Z     1060             0.000000
17       A       Z     1064             0.000000
18       A       Z     1065             0.000000
19       A       Z     1067            24.890250
20       A       Z     1068             2.765583
21       A       Z     1264             0.000000
22       A       Z     1061             0.000000
23       A       Z     1000             0.000000

So every line dataframe would be split out across 8 factors (1060, 1064, 1065, 1067, 1068, 1264, 1061, 1000) So the original amount would 85.19 (*), would be split into 8 parts.因此,每一行 dataframe 将被拆分为 8 个因子(1060、1064、1065、1067、1068、1264、1061、1000),因此原始金额将85.19 (*),将拆分为 8 个部分。 The split is done according to the percentage in dataframe 2. I was thinking of stacking the dataframes and doing a merge, but currently haven't managed to make it work.拆分是根据 dataframe 2 中的百分比完成的。我正在考虑堆叠数据帧并进行合并,但目前还没有成功。

Thanks!谢谢!

  • cleaned up your sample data - made date columns consistent across both dataframes清理您的示例数据 - 使两个数据帧中的日期列保持一致
  • reshape them - make Date a column rather than a series of columns重塑它们 - 使Date成为一列而不是一系列列
  • now it's a straight forward merge现在这是一个直接的合并
  • finally reshape back to your target structure with a pivot to make Date columns again最后用 pivot 重塑回您的目标结构,再次制作日期
df1 = pd.read_csv(io.StringIO("""   level_1  level_2  ...  2021-01-30 00:00:00  2021-01-31 00:00:00
0        A       X  ...            85.191861         61.387459
1        A       Y  ...            40.931960            76.803598
2        A       Z  ...            36.811113            44.834602
3        B       X  ...            60.707172             2.457655
4        B       Y  ...            53.990172            50.636714
5        B       Z  ...            52.347966            27.938671
6        C       X  ...            17.619055            89.028475
7        C       Y  ...            91.442144            56.108239
8        C       Z  ...             8.633474            14.716926
9        D       X  ...            41.690869            68.389570
10       D       Y  ...            37.896461            11.763265
11       D       Z  ...            12.492016            70.571018
"""), sep="\s\s+", engine="python").drop(columns="...")

# cleanup DF1,  make date columns rows
df1 = df1.set_index(["level_1","level_2"]).rename_axis(columns="Date").stack().reset_index()
df1.Date = pd.to_datetime(df1.Date)

df2 = pd.read_csv(io.StringIO("""   level_2  level_3  2021-01-30 00:00:00  2021-01-31 00:00:00
0        X     1060                 0.10                  0.0
1        X     1064                 0.20                  1.0
2        X     1065                 0.50                  0.0
3        X     1067                 0.00                  0.0
4        X     1068                 0.00                  0.0
5        X     1264                 0.20                  0.0
6        X     1061                 0.00                  0.0
7        X     1000                 0.00                  0.0
8        Y     1060                 0.05                  0.1
9        Y     1064                 0.05                  0.2
10       Y     1065                 0.10                  0.5
11       Y     1067                 0.20                  0.0
12       Y     1068                 0.20                  0.0
13       Y     1264                 0.10                  0.2
14       Y     1061                 0.15                  0.0
15       Y     1000                 0.15                  0.0
16       Z     1060                 0.00                  0.0
17       Z     1064                 0.00                  0.0
18       Z     1065                 0.00                  0.0
19       Z     1067                 0.90                  0.9
20       Z     1068                 0.10                  0.1
21       Z     1264                 0.00                  0.0
22       Z     1061                 0.00                  0.0
23       Z     1000                 0.00                  0.0"""), sep="\s\s+", engine="python")

df2 = df2.set_index(["level_2","level_3"]).rename_axis(columns="Date").stack().reset_index()
df2.Date = pd.to_datetime(df2.Date)

dfm = (df1.merge(df2, on=["level_2","Date"])
 .assign(val=lambda dfa: dfa["0_x"]*dfa["0_y"])
 .drop(columns=["0_x","0_y"])
 .pivot(index=["level_1","level_2","level_3"], columns="Date", values="val")
 .reset_index()
)

print(dfm.head(10).to_markdown())
level_1 1级 level_2 level_2 level_3 level_3 2021-01-30 00:00:00 2021-01-30 00:00:00 2021-01-31 00:00:00 2021-01-31 00:00:00
0 0 A一个 X X 1000 1000 0 0 0 0
1 1 A一个 X X 1060 1060 8.51919 8.51919 0 0
2 2 A一个 X X 1061 1061 0 0 0 0
3 3 A一个 X X 1064 1064 17.0384 17.0384 61.3875 61.3875
4 4 A一个 X X 1065 1065 42.5959 42.5959 0 0
5 5 A一个 X X 1067 1067 0 0 0 0
6 6 A一个 X X 1068 1068 0 0 0 0
7 7 A一个 X X 1264 1264 17.0384 17.0384 0 0
8 8 A一个 Y 1000 1000 6.13979 6.13979 0 0
9 9 A一个 Y 1060 1060 2.0466 2.0466 7.68036 7.68036

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM