简体   繁体   中英

Pandas scatter_matrix - plot categorical variables

I am looking at the famous Titanic dataset from the Kaggle competition found here: http://www.kaggle.com/c/titanic-gettingStarted/data

I have loaded and processed the data using:

# import required libraries
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# load the data from the file
df = pd.read_csv('./data/train.csv')

# import the scatter_matrix functionality
from pandas.tools.plotting import scatter_matrix

# define colors list, to be used to plot survived either red (=0) or green (=1)
colors=['red','green']

# make a scatter plot
scatter_matrix(df,figsize=[20,20],marker='x',c=df.Survived.apply(lambda x:colors[x]))

df.info()

来自 matplotlib 的 scatter_matrix

How can I add the categorical columns like Sex and Embarked to the plot?

You need to transform the categorical variables into numbers to plot them.

Example (assuming that the column 'Sex' is holding the gender data, with 'M' for males & 'F' for females)

df['Sex_int'] = np.nan
df.loc[df['Sex'] == 'M', 'Sex_int'] = 0
df.loc[df['Sex'] == 'F', 'Sex_int'] = 1

Now all females are represented by 0 & males by 1. Unknown genders (if there are any) will be ignored.

The rest of your code should process the updated dataframe nicely.

after googling and remembering something like the .map() function I fixed it in the following way:

colors=['red','green'] # color codes for survived : 0=red or 1=green

# create mapping Series for gender so it can be plotted
gender = Series([0,1],index=['male','female'])    
df['gender']=df.Sex.map(gender)

# create mapping Series for Embarked so it can be plotted
embarked = Series([0,1,2,3],index=df.Embarked.unique())
df['embarked']=df.Embarked.map(embarked)

# add survived also back to the df
df['survived']=target

now I can plot it again...and drop the added columns afterwards.

thanks everyone for responding.....

Here is my solution:

# convert string column to category
df.Sex = df.Sex.astype('category')
# create additional column for its codes
df['Sex_code'] = df_clean.Sex.cat.codes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM