[英]Question on Pandas Dataframe .loc / .iloc and String Operations
[英]Math operations in multiindiex dataframe with .loc
我有一个多索引df
:
created_at 2020-06-29 2020-07-06
sales orders last_sales differennce sales orders last_sales differennce
group category 10 10 10 0 10 10 20 50
A a1
a2
a3
B a1 ... ... ... ... ... ... ...
a2
a3
all Total 100 100 100 0 150 150 150 0
我正在尝试计算all & Total
索引的difference
,即((sales / last_sales) -1) * 100
其中created_at
是2020-06-29 to 2021-07-19
。
我试过了:
df.loc[('all','Total'),(slice(None),'difference')] =
((df.loc[('all','Total'),(slice(None),'sales')] /
df.loc[('all','Total'),:'2021-07-19','last_sales']) - 1) * 100
但我收到一个错误:
IndexError:列表索引超出范围
我似乎无法选择我想要的created_at
范围,但这有效:
df.loc[:,:'2021-07-19'] # returning the dates I want, from first to selected
但是当我尝试选择一行时,它会中断:
df_out.loc[:,(:'2021-07-19','sales')] # SyntaxError: invalid syntax
&
df_out.loc[:,:'2021-07-19','sales'] # IndexError: list index out of range
我如何选择group = all
、 category = Total
和日期从2020-06-29 to 2020-07-19
?
我也试过:
df.loc[('all','Total'),(slice(None),'difference')] =
((df.loc[('all','Total'),(slice(None),'sales')] /
df.loc[('all','Total'),(slice(None),'last_sales')]) - 1) * 100 # slice(None) instead of date
但是,这将引发我ZeroDivisionError: float division by zero
,因为有last_sales = 0
以上日期2021-07-19
。 也许还有其他处理0s
? 我试着像这样加1
df.loc[('all','Total'),(slice(None),'last_sales')]+1) - 1) * 100
但我仍然得到同样的错误。
示例df
:
df = pd.DataFrame.from_dict({('group', ''): {0: 'A',
1: 'A',
2: 'A',
3: 'A',
4: 'A',
5: 'A',
6: 'A',
7: 'A',
8: 'A',
9: 'B',
10: 'B',
11: 'B',
12: 'B',
13: 'B',
14: 'B',
15: 'B',
16: 'B',
17: 'B',
18: 'all',
19: 'all'},
('category', ''): {0: 'Amazon',
1: 'Apple',
2: 'Facebook',
3: 'Google',
4: 'Netflix',
5: 'Tesla',
6: 'Total',
7: 'Uber',
8: 'total',
9: 'Amazon',
10: 'Apple',
11: 'Facebook',
12: 'Google',
13: 'Netflix',
14: 'Tesla',
15: 'Total',
16: 'Uber',
17: 'total',
18: 'Total',
19: 'total'},
(pd.Timestamp('2020-06-29 00:00:00'), 'last_sales'): {0: 195.0,
1: 61.0,
2: 106.0,
3: 61.0,
4: 37.0,
5: 13.0,
6: 954.0,
7: 4.0,
8: 477.0,
9: 50.0,
10: 50.0,
11: 75.0,
12: 43.0,
13: 17.0,
14: 14.0,
15: 504.0,
16: 3.0,
17: 252.0,
18: 2916.0,
19: 2916.0},
(pd.Timestamp('2020-06-29 00:00:00'), 'sales'): {0: 1268.85,
1: 18274.385000000002,
2: 19722.65,
3: 55547.255,
4: 15323.800000000001,
5: 1688.6749999999997,
6: 227463.23,
7: 1906.0,
8: 113731.615,
9: 3219.6499999999996,
10: 15852.060000000001,
11: 17743.7,
12: 37795.15,
13: 5918.5,
14: 1708.75,
15: 166349.64,
16: 937.01,
17: 83174.82,
18: 787625.7400000001,
19: 787625.7400000001},
(pd.Timestamp('2020-06-29 00:00:00'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2020-07-06 00:00:00'), 'last_sales'): {0: 26.0,
1: 39.0,
2: 79.0,
3: 49.0,
4: 10.0,
5: 10.0,
6: 436.0,
7: 5.0,
8: 218.0,
9: 89.0,
10: 34.0,
11: 133.0,
12: 66.0,
13: 21.0,
14: 20.0,
15: 732.0,
16: 3.0,
17: 366.0,
18: 2336.0,
19: 2336.0},
(pd.Timestamp('2020-07-06 00:00:00'), 'sales'): {0: 3978.15,
1: 12138.96,
2: 19084.175,
3: 40033.46000000001,
4: 4280.15,
5: 1495.1,
6: 165548.29,
7: 1764.15,
8: 82774.145,
9: 8314.92,
10: 12776.649999999996,
11: 28048.075,
12: 55104.21000000002,
13: 6962.844999999999,
14: 3053.2000000000003,
15: 231049.11000000002,
16: 1264.655,
17: 115524.55500000001,
18: 793194.8000000002,
19: 793194.8000000002},
(pd.Timestamp('2020-07-06 00:00:00'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2021-06-28 00:00:00'), 'last_sales'): {0: 96.0,
1: 56.0,
2: 106.0,
3: 44.0,
4: 34.0,
5: 13.0,
6: 716.0,
7: 9.0,
8: 358.0,
9: 101.0,
10: 22.0,
11: 120.0,
12: 40.0,
13: 13.0,
14: 8.0,
15: 610.0,
16: 1.0,
17: 305.0,
18: 2652.0,
19: 2652.0},
(pd.Timestamp('2021-06-28 00:00:00'), 'sales'): {0: 5194.95,
1: 19102.219999999994,
2: 22796.420000000002,
3: 30853.115,
4: 11461.25,
5: 992.6,
6: 188143.41,
7: 3671.15,
8: 94071.705,
9: 6022.299999999998,
10: 7373.6,
11: 33514.0,
12: 35943.45,
13: 4749.000000000001,
14: 902.01,
15: 177707.32,
16: 349.3,
17: 88853.66,
18: 731701.46,
19: 731701.46},
(pd.Timestamp('2021-06-28 00:00:00'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0},
(pd.Timestamp('2021-07-07 00:00:00'), 'last_sales'): {0: 45.0,
1: 47.0,
2: 87.0,
3: 45.0,
4: 13.0,
5: 8.0,
6: 494.0,
7: 2.0,
8: 247.0,
9: 81.0,
10: 36.0,
11: 143.0,
12: 56.0,
13: 9.0,
14: 9.0,
15: 670.0,
16: 1.0,
17: 335.0,
18: 2328.0,
19: 2328.0},
(pd.Timestamp('2021-07-07 00:00:00'), 'sales'): {0: 7556.414999999998,
1: 14985.05,
2: 16790.899999999998,
3: 36202.729999999996,
4: 4024.97,
5: 1034.45,
6: 163960.32999999996,
7: 1385.65,
8: 81980.16499999998,
9: 5600.544999999999,
10: 11209.92,
11: 32832.61,
12: 42137.44500000001,
13: 3885.1499999999996,
14: 1191.5,
15: 194912.34000000003,
16: 599.0,
17: 97456.17000000001,
18: 717745.3400000001,
19: 717745.3400000001},
(pd.Timestamp('2021-07-07 00:00:00'), 'difference'): {0: 0.0,
1: 0.0,
2: 0.0,
3: 0.0,
4: 0.0,
5: 0.0,
6: 0.0,
7: 0.0,
8: 0.0,
9: 0.0,
10: 0.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 0.0,
16: 0.0,
17: 0.0,
18: 0.0,
19: 0.0}}).set_index(['group','category'])
我们可以使用pd.IndexSlice
并且我们需要对pd.Timestamp
进行切片,因为您的第一级列的类型为 datetime:
idx = pd.IndexSlice
sales = df.loc[('all','Total'),(slice(None),'sales')].droplevel(level=1)
last_sales = df.loc[('all','Total'), :pd.Timestamp("2021-07-07")].loc[idx[:, "last_sales"]]
df.loc[('all','Total'),(slice(None),'difference')] = sales.div(last_sales).sub(1).mul(100).to_numpy()
2020-06-29 00:00:00 2020-07-06 00:00:00 2021-06-28 00:00:00 2021-07-07 00:00:00
last_sales sales difference last_sales sales difference last_sales sales difference last_sales sales difference
group category
A Amazon 195.00 1268.85 0.00 26.00 3978.15 0.00 96.00 5194.95 0.00 45.00 7556.41 0.00
Apple 61.00 18274.39 0.00 39.00 12138.96 0.00 56.00 19102.22 0.00 47.00 14985.05 0.00
Facebook 106.00 19722.65 0.00 79.00 19084.17 0.00 106.00 22796.42 0.00 87.00 16790.90 0.00
Google 61.00 55547.25 0.00 49.00 40033.46 0.00 44.00 30853.12 0.00 45.00 36202.73 0.00
Netflix 37.00 15323.80 0.00 10.00 4280.15 0.00 34.00 11461.25 0.00 13.00 4024.97 0.00
Tesla 13.00 1688.67 0.00 10.00 1495.10 0.00 13.00 992.60 0.00 8.00 1034.45 0.00
Total 954.00 227463.23 0.00 436.00 165548.29 0.00 716.00 188143.41 0.00 494.00 163960.33 0.00
Uber 4.00 1906.00 0.00 5.00 1764.15 0.00 9.00 3671.15 0.00 2.00 1385.65 0.00
total 477.00 113731.62 0.00 218.00 82774.15 0.00 358.00 94071.71 0.00 247.00 81980.16 0.00
B Amazon 50.00 3219.65 0.00 89.00 8314.92 0.00 101.00 6022.30 0.00 81.00 5600.54 0.00
Apple 50.00 15852.06 0.00 34.00 12776.65 0.00 22.00 7373.60 0.00 36.00 11209.92 0.00
Facebook 75.00 17743.70 0.00 133.00 28048.08 0.00 120.00 33514.00 0.00 143.00 32832.61 0.00
Google 43.00 37795.15 0.00 66.00 55104.21 0.00 40.00 35943.45 0.00 56.00 42137.45 0.00
Netflix 17.00 5918.50 0.00 21.00 6962.84 0.00 13.00 4749.00 0.00 9.00 3885.15 0.00
Tesla 14.00 1708.75 0.00 20.00 3053.20 0.00 8.00 902.01 0.00 9.00 1191.50 0.00
Total 504.00 166349.64 0.00 732.00 231049.11 0.00 610.00 177707.32 0.00 670.00 194912.34 0.00
Uber 3.00 937.01 0.00 3.00 1264.65 0.00 1.00 349.30 0.00 1.00 599.00 0.00
total 252.00 83174.82 0.00 366.00 115524.56 0.00 305.00 88853.66 0.00 335.00 97456.17 0.00
all Total 2916.00 787625.74 26910.48 2336.00 793194.80 33855.26 2652.00 731701.46 27490.55 2328.00 717745.34 30730.99
total 2916.00 787625.74 0.00 2336.00 793194.80 0.00 2652.00 731701.46 0.00 2328.00 717745.34 0.00
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.