I am trying to loop through a Pandas data frame and produce a bar chart only for columns that contain exactly two unique values. I envision the final bar chart to contain the two unique values on the X axis, and the Y axis to show the number of rows.
I've been able to produce a Series off my data frame (df_clean) which shows me the number of unique values per column:
col_values = df_clean.apply(lambda x: len(x.unique()))
But I am completely lost how to:
In the same code, I have been able to successfully loop through my df_clean and successfully plot all the int and float type columns. I am struggling with how to modify this working code for the above issue.
i = 1
c_num_cols = len(df_clean.select_dtypes(["int64","float64"]).columns)
for column in df_clean.select_dtypes(["int64","float64"]).columns:
plt.subplot(c_num_cols,(c_num_cols % 2) + 1,i)
plt.subplots_adjust(hspace=0.5)
df_clean[column].plot(kind = 'hist', figsize = [15,c_num_cols * 4], title = column)
i += 1
Try using Series.nunique
and Series.value_counts
:
binary_cols = df.nunique()[lambda x: x == 2].index
for i, col in enumerate(binary_cols):
plt.subplot(len(binary_cols), (len(binary_cols) % 2) + 1, i+1)
plt.subplots_adjust(hspace=0.5)
df[col].value_counts().plot(kind='bar')
# Setup
df = pd.DataFrame({'col1': list('aaaaaaabbbbbbbb'),
'col2': list('aaabbbcccdddeee'),
'col3': [1] * 9 + [3] * 6})
binary_cols = df.nunique()[lambda x: x == 2].index
for i, col in enumerate(binary_cols):
plt.subplot(len(binary_cols), (len(binary_cols) % 2) + 1, i+1)
plt.subplots_adjust(hspace=0.5)
df[col].value_counts().plot(kind='bar')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.