I have a pandas dataframe with two columns named "column one" and "column two". I want to select the counts of all values in "column two" where "column one" has value b. I can do this in two steps with this code:
data = [['a', 'val1'], ['b', 'val2'], ['b', 'val2'], ['b','val3'], ['b','val4'], ['a', 'val5'], ['a', 'val6']]
ex = pd.DataFrame(data, columns = ['column one', 'column two'])
exa = ex[ex['column one']=='b']
exa['column two'].value_counts()
This will give me the output:
val2 2
val3 1
val4 1
Now how do I write this such that my output includes the values val1, val5 and val6 showing 0
Use Series.reindex
by unique values of original column:
s = exa['column two'].value_counts().reindex(ex['column two'].unique(), fill_value=0)
print (s)
val1 0
val2 2
val3 1
val4 1
val5 0
val6 0
Name: column two, dtype: int64
Just out of curiosity is there a way to do this without having to create the second dataframe exa?
Yes, you can chain code together and add DataFrame.loc
for select column by condition:
s = (ex.loc[ex['column one']=='b', 'column two']
.value_counts()
.reindex(ex['column two'].unique(), fill_value=0))
Solution with aggregation:
s = ex['column one'].eq('b').view('i1').groupby(ex['column two']).sum()
#alternative
s = ex['column one'].eq('b').astype(int).groupby(ex['column two']).sum()
print (s)
column two
val1 0
val2 2
val3 1
val4 1
val5 0
val6 0
Name: column one, dtype: int8
Or with groupby
import pandas as pd
import numpy as np
# data
data = [['a', 'val1'], ['b', 'val2'], ['b', 'val2'], ['b','val3'], ['b','val4'], ['a', 'val5'], ['a', 'val6']]
ex = pd.DataFrame(data, columns = ['column one', 'column two'])
#
ex.groupby('column two')['column one'].apply(lambda x: np.sum(x=='b'))
This will return a pandas series
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.