I want to select rows from a dataframe based on different values of a certain column variable and make histograms.
import numpy as np
import pandas as pd
import csv
import matplotlib.pyplot as plt
df_train=pd.read_csv(r'C:\users\visha\downloads\1994_census\adult.data')
df_train.columns = ["age", "workclass", "fnlwgt", "education",
"educationnum", "maritalstatus", "occupation",
"relationship", "race", "sex", "capitalgain",
"capitalloss", "hoursperweek", "nativecountry",
"incomelevel"]
df_train.dropna(how='any')
df_train.loc[(df_train!=0).any(axis=1)]
#df_train.incomelevel = pd.to_numeric(df_train.incomelevel, errors =
'coerce').fillna(0).astype('Int64')
df_train.drop(columns='fnlwgt', inplace = True)
#df_test=pd.read_csv(r'C:\users\visha\downloads\1994_census\adult.test')
#df_train.boxplot(column = 'age', by = 'incomelevel', grid = False)
df_train.loc[df_train['incomelevel'] == '<=50K']
#df_train.loc[df_train['incomelevel'] == '>50K']
Output: Empty DataFrame Columns: [age, workclass, fnlwgt, education, educationnum, maritalstatus, occupation, relationship, race, sex, capitalgain, capitalloss, hoursperweek, nativecountry, incomelevel] Index: []
From the above lines you can derive that I'm trying to select rows that have income level of '<=50K'. The 'incomelevel' column is of object datatype. But when I try to print it, it just returns all the column names and mentions the dataframe as 'empty'. Or when I run it as is in jupyter notebook without the print function, it just displays the dataframe with all the column names, except nothing under those columns.
You should call the csv with skipinitialspace=True
because there are spaces in the front of each value, then it works:
df = pd.read_csv('adult.data', header=None, skipinitialspace=True)
df.columns = ["age", "workclass", "fnlwgt", "education",
"educationnum", "maritalstatus", "occupation",
"relationship", "race", "sex", "capitalgain",
"capitalloss", "hoursperweek", "nativecountry",
"incomelevel"]
df = df[df['incomelevel']=='<=50K']
print(df.head())
age workclass fnlwgt education educationnum maritalstatus ... sex capitalgain capitalloss hoursperweek nativecountry incomelevel
0 39 State-gov 77516 Bachelors 13 Never-married ... Male 2174 0 40 United-States <=50K
1 50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse ... Male 0 0 13 United-States <=50K
2 38 Private 215646 HS-grad 9 Divorced ... Male 0 0 40 United-States <=50K
3 53 Private 234721 11th 7 Married-civ-spouse ... Male 0 0 40 United-States <=50K
4 28 Private 338409 Bachelors 13 Married-civ-spouse ... Female 0 0 40 Cuba <=50K
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.