简体   繁体   中英

pandas groupby dataframe 2 based on dataframe 1

data = {"Team": ["Red Sox", "Red Sox", "Red Sox", "Red Sox", "Red Sox", "Red Sox", "Yankees", 
                 "Yankees", "Yankees", "Yankees", "Yankees", "Yankees"],
        "Pos": ["Pitcher", "Pitcher", "Pitcher", "Not Pitcher", "Not Pitcher", "Not Pitcher", 
                "Pitcher", "Pitcher", "Pitcher", "Not Pitcher", "Not Pitcher", "Not Pitcher"],
        "Age": [24, 28, 40, 22, 29, 33, 31, 26, 21, 36, 25, 31]}
df1 = pd.DataFrame(data)

Now im grouping by 2 columns using the following code:

grouped_multiple = df1.groupby(['Team', 'Pos']).agg({'Age': ['mean', 'min', 'max']})
grouped_multiple.columns = ['age_mean', 'age_min', 'age_max']
grouped_multiple = grouped_multiple.reset_index()

Now I create a second dataframe with also 3 columns with same lenght but only numbers as values. Imagine each cell of dataframe 1 is linked with the same positional cell of dataframe 2. When I groupby dataframe 1 --> I want to get the corresponding values of dataframe 2

so df1 groupyby column 1

["Red Sox", "Red Sox", "Red Sox", "Red Sox", "Red Sox", "Red Sox", "Yankees", 
 "Yankees", "Yankees", "Yankees", "Yankees", "Yankees"]

results in

["Red Sox", "Yankees"]

lets say df2 column 1 looks like

[1,2,4,3,2,3,4,5,3,5,6,7]

so I want to have the values of df2 - column 1 --> in one list where the corresponding index of df1 were taken of each "Red Sox" and "Yankees"

like

[[1,2,4,3,2,3][4,5,3,5,6,7]]

I am a bit unclear as to what you are trying to do, but if you concatenate the two dataframes thus:

newdf = pd.concat([df1, df2], axis=1)

then you can do your groupby and do the needful with the last three columns.

Not sure where grouped_multiple comes into your problem, I think you can do if df1 and df2 have same length

df2 = pd.DataFrame({'col1':[1,2,4,3,2,3,4,5,3,5,6,7]})
s = df2['col1'].groupby(df1['Team']).agg(list)

and you get

print (s)
Team
Red Sox    [1, 2, 4, 3, 2, 3]
Yankees    [4, 5, 3, 5, 6, 7]
Name: col1, dtype: object

or if you want a list of list, then

l = s.tolist()
print (l)
[[1, 2, 4, 3, 2, 3], [4, 5, 3, 5, 6, 7]]

And if you want to groupby both columns from df1, then you can do

df2['col1'].groupby([df1['Team'], df1['Pos']]).agg(list)
Team     Pos        
Red Sox  Not Pitcher    [3, 2, 3]
         Pitcher        [1, 2, 4]
Yankees  Not Pitcher    [5, 6, 7]
         Pitcher        [4, 5, 3]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM