简体   繁体   中英

How to group rows in pandas without groupby?

I have the following Pandas DataFrame and I am trying to group animals according to their class. I know I can use groupby to get a faster result. However, I was thinking if there was a way to replicate the groupby function by iterating over the rows.

df = pd.DataFrame([('bird', 'Falconiformes', 389.0),
('bird', 'Psittaciformes', 24.0),
('mammal', 'Carnivora', 80.2),
('mammal', 'Primates', np.nan),
('mammal', 'Carnivora', 58)],
index=['falcon', 'parrot', 'lion', 'monkey', 'leopard'],
columns=('class', 'order', 'max_speed'))

I have been trying to use the following code but it doesn't work, and I can't find another method.

birds = []
mammal = []
for i, columnclass in df.iterrows():
  if i == 'bird':
    birds.append(i)
  else:
    mammal.append(i) 
print(birds)
print(mammal)

The output should be something similar to this code.

group = df.groupby(['class']).sum()

Out[1]:

class       max_speed     
bird        413.0
mammal      138.2

Here's a solution, although it's actually _deprecated in favor of df.set_index('class').groupby(level=0).sum() :

group = df.set_index('class')['max_speed'].sum(level=0)

Output:

>>> group
class
bird      413.0
mammal    138.2
Name: max_speed, dtype: float64

You don't really need a loop for any of this. First get a list of the unique elements:

classes = df['class'].unique()

Now you can make a dictionary or whatever you want out of it:

data = {cls: df['class'] == cls for cls in classes}

Or the one-liner:

data = {cls: df['class'] == cls for cls in df['class'].unique()}

But why do something like this when you can just use groupby ?

The iterrows method of the data frame returns a 2-tuple containing (index, series of the row data indexed by the column names). This is a quote from pandas documentation :

DataFrame.iterrows()

Iterate over DataFrame rows as (index, Series) pairs.

you need to access the class column of each row. You can do that with direct unpacking in the for loop:

birds = []
mammal = []
for i, (columnclass, _, _) in df.iterrows():
    if columnclass == "bird":
        birds.append(i)
    else:
        mammal.append(i)
print(birds)
print(mammal)

You can get class name referencing columnclass['class'] on loop:

birds = []
mammal = []
for i, columnclass in df.iterrows():
  if columnclass['class'] == 'bird':
    birds.append(i)
  else:
    mammal.append(i) 
print(birds)
print(mammal)

Output:

['falcon', 'parrot']
['lion', 'monkey', 'leopard']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM