I have a function that aims at printing the sum
along a column of a pandas
DataFrame
after filtering on some rows to be defined; and the percentage this quantity makes up in the same sum without any filter:
def my_function(df, filter_to_apply, col):
my_sum = np.sum(df[filter_to_apply][col])
print(my_sum)
print(my_sum/np.sum(df[col]))
Now I am wondering if there is any way to have a filter_to_apply
that actually doesn't do any filter (ie keeps all rows), to keep using my function (that is actually a bit more complex and convenient) even when I don't want any filter.
So, some filter_f1
that would do: df[filter_f1] = df
and could be used with other filters: filter_f1 & filter_f2
.
One possible answer is: df.index.isin(df.index)
but I am wondering if there is anything easier to understand (eg I tried to use just True
but it didn't work).
This is a way to select all rows:
df[range(0, len(df))]
this is also
df[:]
But I haven't figured out a way to pass :
as an argument.
Theres a function called loc
on pandas that filters rows. You could do something like this:
df2 = df.loc[<Filter here>]
#Filter can be something like df['price']>500 or df['name'] == 'Brian'
#basically something that for each row returns a boolean
total = df2['ColumnToSum'].sum()
A Python slice object, ie slice(-1)
, acts as an object that selects all indexes in a indexable object. So df[slice(-1)]
would select all rows in the DataFrame
. You can store that in a variable an an initial value which you can further refine in your logic:
filter_to_apply = slice(-1) # initialize to select all rows
... # logic that may set `filter_to_apply` to something more restrictive
my_function(df, filter_to_apply, col)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.