简体   繁体   English

如何从 python 中的 dataframe 中获取子集 dataframe?

[英]How to get the subset dataframe from the dataframe in python?

I have a dataframe(df):我有一个数据框(df):

         id  CI    VaR
0        1  0.600  1000
1        1  0.650  1100
2        1  0.700  1200
3        1  0.750  1300
4        2  0.600  2500
5        2  0.650  2600
6        2  0.700  2700
7        2  0.750  2800
8        3  0.600  1500
9        3  0.650  1600
10       3  0.700  1700
11       3  0.750  1800

i have to create a subset dataframe from this dataframe im doing this;我必须从这个 dataframe 创建一个子集 dataframe 我正在这样做;

for col in range(1,4):
    df2 = df1.loc[df1["id"]==col]
    print(df2)

output: output:

         id  CI    VaR
0        1  0.600  1000
1        1  0.650  1100
2        1  0.700  1200
3        1  0.750  1300

and

4        2  0.600  2500
5        2  0.650  2600
6        2  0.700  2700
7        2  0.750  2800

and

8        3  0.600  1500
9        3  0.650  1600
10       3  0.700  1700
11       3  0.750  1800

this will give me separate dataframe for id 1,2,3 Now i want to VaR value of all dataframe(1,2 and 3) and add all the value in the order they are and append to the dataframe of respective.这将为 id 1,2,3 提供单独的 dataframe 现在我想要所有数据帧(1,2 和 3)的 VaR 值,并按它们的顺序将所有值和 append 添加到相应的 Z6A8060D755DF47C55555550 中。 like:喜欢:

obj = 0
for col in range(1,4):
    df2 = df1.loc[df1["id"]==col]
    obj = obj + df1["VaR"] # error is here
    print(df2)

But this is not working for me但这对我不起作用

i need ouput like;我需要像这样的输出;

         id  CI    VaR   capital
0        1  0.600  1000  5000
1        1  0.650  1100  5300
2        1  0.700  1200  5600
3        1  0.750  1300  5900

capital value 5000 came from adding 1000 + 2500 + 1500 ( these are all first value of respecti id) capital value 5300 came from adding 1100 + 2600 + 1600 ( these are all second value of respecti id) and so on... and i need for all the id's like;资本价值 5000 来自添加 1000 + 2500 + 1500 (这些都是相关 ID 的第一个值) 资本价值 5300 来自添加 1100 + 2600 + 1600 (这些都是相关 ID 的第二个值)等等......我需要所有的id;

4        2  0.600  2500   5000
5        2  0.650  2600   5300
6        2  0.700  2700   5600
7        2  0.750  2800   5900

and

8        3  0.600  1500   5000
9        3  0.650  1600   5300
10       3  0.700  1700   5600
11       3  0.750  1800   5900

Thanks for your time:)谢谢你的时间:)

I hope I've understood your question right.我希望我已经正确理解了你的问题。 If you need to repeat the firt, second, third... sums of values in each group:如果您需要重复每组中的第一个、第二个、第三个......值的总和:

vals = df.groupby(df.groupby("id").cumcount())["VaR"].sum()
df["capital"] = [*vals] * df["id"].nunique()
print(df)

Prints:印刷:

    id    CI   VaR  capital
0    1  0.60  1000     5000
1    1  0.65  1100     5300
2    1  0.70  1200     5600
3    1  0.75  1300     5900
4    2  0.60  2500     5000
5    2  0.65  2600     5300
6    2  0.70  2700     5600
7    2  0.75  2800     5900
8    3  0.60  1500     5000
9    3  0.65  1600     5300
10   3  0.70  1700     5600
11   3  0.75  1800     5900

An option via np.tile and a different way to divide the DataFrame via np.arraysplit :通过 np.tile 的选项和通过np.tile划分np.arraysplit的不同方法:

(Assumption: All id groups are equal length, and the total number of groups is equal to the number of rows per group) (假设:所有id分组长度相等,分组总数等于每分组的行数)

from pprint import pprint

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
    'CI': [0.6, 0.65, 0.7, 0.75, 0.6, 0.65, 0.7, 0.75, 0.6, 0.65, 0.7, 0.75],
    'VaR': [1000, 1100, 1200, 1300, 2500, 2600, 2700, 2800, 1500, 1600, 1700,
            1800]
})

unique_count = df['id'].nunique()
df['capital'] = np.tile(
    df.groupby(df.groupby("id").cumcount())["VaR"].sum(),
    unique_count
)

dfs = np.array_split(df, unique_count)

pprint(dfs)

dfs : dfs

[   id    CI   VaR  capital
0   1  0.60  1000     5000
1   1  0.65  1100     5300
2   1  0.70  1200     5600
3   1  0.75  1300     5900,
    id    CI   VaR  capital
4   2  0.60  2500     5000
5   2  0.65  2600     5300
6   2  0.70  2700     5600
7   2  0.75  2800     5900,
     id    CI   VaR  capital
8    3  0.60  1500     5000
9    3  0.65  1600     5300
10   3  0.70  1700     5600
11   3  0.75  1800     5900]

Let's use groupby with transform :让我们将groupbytransform一起使用:

df['capital'] = df.groupby(df.groupby('id').cumcount())['VaR'].transform('sum')

Output: Output:

  id    CI   VaR  capital
0    1  0.60  1000     5000
1    1  0.65  1100     5300
2    1  0.70  1200     5600
3    1  0.75  1300     5900
4    2  0.60  2500     5000
5    2  0.65  2600     5300
6    2  0.70  2700     5600
7    2  0.75  2800     5900
8    3  0.60  1500     5000
9    3  0.65  1600     5300
10   3  0.70  1700     5600
11   3  0.75  1800     5900

Details:细节:

  • First, groupby 'id' and cumcount to get position in each group首先 groupby 'id' 和cumcount得到每个组中的 position
  • Then, groupby "position" and sum with transform然后,按“位置”分组并用transform求和
  • pandas will handle aligning values using indexes pandas 将使用索引处理对齐值

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM