I have the following Pandas DataFrame and I am trying to group animals according to their class. I know I can use groupby to get a faster result. However, I was thinking if there was a way to replicate the groupby function by iterating over the rows.
df = pd.DataFrame([('bird', 'Falconiformes', 389.0),
('bird', 'Psittaciformes', 24.0),
('mammal', 'Carnivora', 80.2),
('mammal', 'Primates', np.nan),
('mammal', 'Carnivora', 58)],
index=['falcon', 'parrot', 'lion', 'monkey', 'leopard'],
columns=('class', 'order', 'max_speed'))
I have been trying to use the following code but it doesn't work, and I can't find another method.
birds = []
mammal = []
for i, columnclass in df.iterrows():
if i == 'bird':
birds.append(i)
else:
mammal.append(i)
print(birds)
print(mammal)
The output should be something similar to this code.
group = df.groupby(['class']).sum()
Out[1]:
class max_speed
bird 413.0
mammal 138.2
Here's a solution, although it's actually _deprecated in favor of df.set_index('class').groupby(level=0).sum()
:
group = df.set_index('class')['max_speed'].sum(level=0)
Output:
>>> group
class
bird 413.0
mammal 138.2
Name: max_speed, dtype: float64
You don't really need a loop for any of this. First get a list of the unique elements:
classes = df['class'].unique()
Now you can make a dictionary or whatever you want out of it:
data = {cls: df['class'] == cls for cls in classes}
Or the one-liner:
data = {cls: df['class'] == cls for cls in df['class'].unique()}
But why do something like this when you can just use groupby
?
The iterrows
method of the data frame returns a 2-tuple containing (index, series of the row data indexed by the column names). This is a quote from pandas documentation :
DataFrame.iterrows()
Iterate over DataFrame rows as (index, Series) pairs.
you need to access the class
column of each row. You can do that with direct unpacking in the for loop:
birds = []
mammal = []
for i, (columnclass, _, _) in df.iterrows():
if columnclass == "bird":
birds.append(i)
else:
mammal.append(i)
print(birds)
print(mammal)
You can get class name referencing columnclass['class']
on loop:
birds = []
mammal = []
for i, columnclass in df.iterrows():
if columnclass['class'] == 'bird':
birds.append(i)
else:
mammal.append(i)
print(birds)
print(mammal)
Output:
['falcon', 'parrot']
['lion', 'monkey', 'leopard']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.