[英]How to get the subset dataframe from the dataframe in python?
I have a dataframe(df):我有一个数据框(df):
id CI VaR
0 1 0.600 1000
1 1 0.650 1100
2 1 0.700 1200
3 1 0.750 1300
4 2 0.600 2500
5 2 0.650 2600
6 2 0.700 2700
7 2 0.750 2800
8 3 0.600 1500
9 3 0.650 1600
10 3 0.700 1700
11 3 0.750 1800
i have to create a subset dataframe from this dataframe im doing this;我必须从这个 dataframe 创建一个子集 dataframe 我正在这样做;
for col in range(1,4):
df2 = df1.loc[df1["id"]==col]
print(df2)
output: output:
id CI VaR
0 1 0.600 1000
1 1 0.650 1100
2 1 0.700 1200
3 1 0.750 1300
and和
4 2 0.600 2500
5 2 0.650 2600
6 2 0.700 2700
7 2 0.750 2800
and和
8 3 0.600 1500
9 3 0.650 1600
10 3 0.700 1700
11 3 0.750 1800
this will give me separate dataframe for id 1,2,3 Now i want to VaR value of all dataframe(1,2 and 3) and add all the value in the order they are and append to the dataframe of respective.这将为 id 1,2,3 提供单独的 dataframe 现在我想要所有数据帧(1,2 和 3)的 VaR 值,并按它们的顺序将所有值和 append 添加到相应的 Z6A8060D755DF47C55555550 中。 like:喜欢:
obj = 0
for col in range(1,4):
df2 = df1.loc[df1["id"]==col]
obj = obj + df1["VaR"] # error is here
print(df2)
But this is not working for me但这对我不起作用
i need ouput like;我需要像这样的输出;
id CI VaR capital
0 1 0.600 1000 5000
1 1 0.650 1100 5300
2 1 0.700 1200 5600
3 1 0.750 1300 5900
capital value 5000 came from adding 1000 + 2500 + 1500 ( these are all first value of respecti id) capital value 5300 came from adding 1100 + 2600 + 1600 ( these are all second value of respecti id) and so on... and i need for all the id's like;资本价值 5000 来自添加 1000 + 2500 + 1500 (这些都是相关 ID 的第一个值) 资本价值 5300 来自添加 1100 + 2600 + 1600 (这些都是相关 ID 的第二个值)等等......我需要所有的id;
4 2 0.600 2500 5000
5 2 0.650 2600 5300
6 2 0.700 2700 5600
7 2 0.750 2800 5900
and和
8 3 0.600 1500 5000
9 3 0.650 1600 5300
10 3 0.700 1700 5600
11 3 0.750 1800 5900
Thanks for your time:)谢谢你的时间:)
I hope I've understood your question right.我希望我已经正确理解了你的问题。 If you need to repeat the firt, second, third... sums of values in each group:如果您需要重复每组中的第一个、第二个、第三个......值的总和:
vals = df.groupby(df.groupby("id").cumcount())["VaR"].sum()
df["capital"] = [*vals] * df["id"].nunique()
print(df)
Prints:印刷:
id CI VaR capital
0 1 0.60 1000 5000
1 1 0.65 1100 5300
2 1 0.70 1200 5600
3 1 0.75 1300 5900
4 2 0.60 2500 5000
5 2 0.65 2600 5300
6 2 0.70 2700 5600
7 2 0.75 2800 5900
8 3 0.60 1500 5000
9 3 0.65 1600 5300
10 3 0.70 1700 5600
11 3 0.75 1800 5900
An option via np.tile
and a different way to divide the DataFrame via np.arraysplit
:通过 np.tile 的选项和通过np.tile
划分np.arraysplit
的不同方法:
(Assumption: All id groups are equal length, and the total number of groups is equal to the number of rows per group) (假设:所有id分组长度相等,分组总数等于每分组的行数)
from pprint import pprint
import numpy as np
import pandas as pd
df = pd.DataFrame({
'id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'CI': [0.6, 0.65, 0.7, 0.75, 0.6, 0.65, 0.7, 0.75, 0.6, 0.65, 0.7, 0.75],
'VaR': [1000, 1100, 1200, 1300, 2500, 2600, 2700, 2800, 1500, 1600, 1700,
1800]
})
unique_count = df['id'].nunique()
df['capital'] = np.tile(
df.groupby(df.groupby("id").cumcount())["VaR"].sum(),
unique_count
)
dfs = np.array_split(df, unique_count)
pprint(dfs)
dfs
: dfs
:
[ id CI VaR capital
0 1 0.60 1000 5000
1 1 0.65 1100 5300
2 1 0.70 1200 5600
3 1 0.75 1300 5900,
id CI VaR capital
4 2 0.60 2500 5000
5 2 0.65 2600 5300
6 2 0.70 2700 5600
7 2 0.75 2800 5900,
id CI VaR capital
8 3 0.60 1500 5000
9 3 0.65 1600 5300
10 3 0.70 1700 5600
11 3 0.75 1800 5900]
Let's use groupby
with transform
:让我们将groupby
与transform
一起使用:
df['capital'] = df.groupby(df.groupby('id').cumcount())['VaR'].transform('sum')
Output: Output:
id CI VaR capital
0 1 0.60 1000 5000
1 1 0.65 1100 5300
2 1 0.70 1200 5600
3 1 0.75 1300 5900
4 2 0.60 2500 5000
5 2 0.65 2600 5300
6 2 0.70 2700 5600
7 2 0.75 2800 5900
8 3 0.60 1500 5000
9 3 0.65 1600 5300
10 3 0.70 1700 5600
11 3 0.75 1800 5900
Details:细节:
cumcount
to get position in each group首先 groupby 'id' 和cumcount
得到每个组中的 positiontransform
然后,按“位置”分组并用transform
求和
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.