[英]how to split lines in dataframe pandas into new lines given a percentage
I have two dataframes that I would like to turn into one dataframe.我有两个数据框,我想将它们变成一个 dataframe。
dataframe 1: dataframe 1:
level_1 level_2 ... 2021-01-30 00:00:00 2021-01-31 00:00:00
0 A X ... 85.191861 (*) 61.387459
1 A Y ... 40.931960 76.803598
2 A Z ... 36.811113 44.834602
3 B X ... 60.707172 2.457655
4 B Y ... 53.990172 50.636714
5 B Z ... 52.347966 27.938671
6 C X ... 17.619055 89.028475
7 C Y ... 91.442144 56.108239
8 C Z ... 8.633474 14.716926
9 D X ... 41.690869 68.389570
10 D Y ... 37.896461 11.763265
11 D Z ... 12.492016 70.571018
dataframe 2: dataframe 2:
level_2 level_3 2020-02-01 00:00:00 2020-02-02 00:00:00
0 X 1060 0.10 0.0
1 X 1064 0.20 1.0
2 X 1065 0.50 0.0
3 X 1067 0.00 0.0
4 X 1068 0.00 0.0
5 X 1264 0.20 0.0
6 X 1061 0.00 0.0
7 X 1000 0.00 0.0
8 Y 1060 0.05 0.1
9 Y 1064 0.05 0.2
10 Y 1065 0.10 0.5
11 Y 1067 0.20 0.0
12 Y 1068 0.20 0.0
13 Y 1264 0.10 0.2
14 Y 1061 0.15 0.0
15 Y 1000 0.15 0.0
16 Z 1060 0.00 0.0
17 Z 1064 0.00 0.0
18 Z 1065 0.00 0.0
19 Z 1067 0.90 0.9
20 Z 1068 0.10 0.1
21 Z 1264 0.00 0.0
22 Z 1061 0.00 0.0
23 Z 1000 0.00 0.0
I would like to add a third column to the first dataframe, level 3, and split the original row across the percentage given in dataframe 2. The numbers aren't correct, but the structure would look something like this:我想在第 3 级的第一个 dataframe 中添加第三列,并将原始行拆分为 dataframe 2 中给出的百分比。数字不正确,但结构看起来像这样:
level_1 level_2 level_3 2020-02-01 00:00:00
0 A X 1060 2.374184
1 A X 1064 4.748367
2 A X 1065 11.870918
3 A X 1067 0.000000
4 A X 1068 0.000000
5 A X 1264 4.748367
6 A X 1061 0.000000
7 A X 1000 0.000000
8 A Y 1060 2.813819
9 A Y 1064 2.813819
10 A Y 1065 5.627637
11 A Y 1067 11.255275
12 A Y 1068 11.255275
13 A Y 1264 5.627637
14 A Y 1061 8.441456
15 A Y 1000 8.441456
16 A Z 1060 0.000000
17 A Z 1064 0.000000
18 A Z 1065 0.000000
19 A Z 1067 24.890250
20 A Z 1068 2.765583
21 A Z 1264 0.000000
22 A Z 1061 0.000000
23 A Z 1000 0.000000
So every line dataframe would be split out across 8 factors (1060, 1064, 1065, 1067, 1068, 1264, 1061, 1000) So the original amount would 85.19 (*), would be split into 8 parts.因此,每一行 dataframe 将被拆分为 8 个因子(1060、1064、1065、1067、1068、1264、1061、1000),因此原始金额将85.19 (*),将拆分为 8 个部分。 The split is done according to the percentage in dataframe 2. I was thinking of stacking the dataframes and doing a merge, but currently haven't managed to make it work.
拆分是根据 dataframe 2 中的百分比完成的。我正在考虑堆叠数据帧并进行合并,但目前还没有成功。
Thanks!谢谢!
df1 = pd.read_csv(io.StringIO(""" level_1 level_2 ... 2021-01-30 00:00:00 2021-01-31 00:00:00
0 A X ... 85.191861 61.387459
1 A Y ... 40.931960 76.803598
2 A Z ... 36.811113 44.834602
3 B X ... 60.707172 2.457655
4 B Y ... 53.990172 50.636714
5 B Z ... 52.347966 27.938671
6 C X ... 17.619055 89.028475
7 C Y ... 91.442144 56.108239
8 C Z ... 8.633474 14.716926
9 D X ... 41.690869 68.389570
10 D Y ... 37.896461 11.763265
11 D Z ... 12.492016 70.571018
"""), sep="\s\s+", engine="python").drop(columns="...")
# cleanup DF1, make date columns rows
df1 = df1.set_index(["level_1","level_2"]).rename_axis(columns="Date").stack().reset_index()
df1.Date = pd.to_datetime(df1.Date)
df2 = pd.read_csv(io.StringIO(""" level_2 level_3 2021-01-30 00:00:00 2021-01-31 00:00:00
0 X 1060 0.10 0.0
1 X 1064 0.20 1.0
2 X 1065 0.50 0.0
3 X 1067 0.00 0.0
4 X 1068 0.00 0.0
5 X 1264 0.20 0.0
6 X 1061 0.00 0.0
7 X 1000 0.00 0.0
8 Y 1060 0.05 0.1
9 Y 1064 0.05 0.2
10 Y 1065 0.10 0.5
11 Y 1067 0.20 0.0
12 Y 1068 0.20 0.0
13 Y 1264 0.10 0.2
14 Y 1061 0.15 0.0
15 Y 1000 0.15 0.0
16 Z 1060 0.00 0.0
17 Z 1064 0.00 0.0
18 Z 1065 0.00 0.0
19 Z 1067 0.90 0.9
20 Z 1068 0.10 0.1
21 Z 1264 0.00 0.0
22 Z 1061 0.00 0.0
23 Z 1000 0.00 0.0"""), sep="\s\s+", engine="python")
df2 = df2.set_index(["level_2","level_3"]).rename_axis(columns="Date").stack().reset_index()
df2.Date = pd.to_datetime(df2.Date)
dfm = (df1.merge(df2, on=["level_2","Date"])
.assign(val=lambda dfa: dfa["0_x"]*dfa["0_y"])
.drop(columns=["0_x","0_y"])
.pivot(index=["level_1","level_2","level_3"], columns="Date", values="val")
.reset_index()
)
print(dfm.head(10).to_markdown())
level_1 ![]() |
level_2 ![]() |
level_3 ![]() |
2021-01-30 00:00:00 ![]() |
2021-01-31 00:00:00 ![]() |
|
---|---|---|---|---|---|
0 ![]() |
A![]() |
X ![]() |
1000 ![]() |
0 ![]() |
0 ![]() |
1 ![]() |
A![]() |
X ![]() |
1060 ![]() |
8.51919 ![]() |
0 ![]() |
2 ![]() |
A![]() |
X ![]() |
1061 ![]() |
0 ![]() |
0 ![]() |
3 ![]() |
A![]() |
X ![]() |
1064 ![]() |
17.0384 ![]() |
61.3875 ![]() |
4 ![]() |
A![]() |
X ![]() |
1065 ![]() |
42.5959 ![]() |
0 ![]() |
5 ![]() |
A![]() |
X ![]() |
1067 ![]() |
0 ![]() |
0 ![]() |
6 ![]() |
A![]() |
X ![]() |
1068 ![]() |
0 ![]() |
0 ![]() |
7 ![]() |
A![]() |
X ![]() |
1264 ![]() |
17.0384 ![]() |
0 ![]() |
8 ![]() |
A![]() |
Y![]() |
1000 ![]() |
6.13979 ![]() |
0 ![]() |
9 ![]() |
A![]() |
Y![]() |
1060 ![]() |
2.0466 ![]() |
7.68036 ![]() |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.