簡體   English   中英

當導出到 csv 時,python 中的串聯 dataframe 顯示空白行

[英]A concatenated dataframe in python when exported to csv shows blank rows

在為 Titanic 數據集實施以下邏輯回歸時,將刪除沒有值的行。 但是當這些被刪除的行與預測連接時,它們仍然顯示為空白行。 為什么會這樣?

原始數據請參考https://www.kaggle.com/c/titanic/data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
dir_path = os.path.dirname(os.path.realpath(__file__))
os.chdir(dir_path)
train = pd.read_csv('titanic_train.csv')
sns.set_style('whitegrid')

#Data cleaning
train.drop('Cabin',axis=1,inplace=True)
train.dropna(inplace=True)
#Categorical data to dummy vars
sex = pd.get_dummies(train['Sex'],drop_first=True)
embark = pd.get_dummies(train['Embarked'],drop_first=True)
train.drop(['Sex','Embarked','Name','Ticket','PassengerId'],axis=1,inplace=True)
train = pd.concat([train,sex,embark],axis=1)
print(train.head())

#Develop model
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(train.drop('Survived',axis=1),train['Survived'],test_size=0.3,random_state=101)
from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)
predictions=logmodel.predict(X_test)
predser=pd.Series(predictions)
trainnew=pd.concat([X_test,y_test,predser],axis=1)
trainnew.to_csv('trainresults.csv',index=False)

清洗后的數據集樣本:

   Survived  Pclass   Age  SibSp  Parch     Fare  male  Q  S
0         0       3  22.0      1      0   7.2500     1  0  1
1         1       1  38.0      1      0  71.2833     0  0  0
2         1       3  26.0      0      0   7.9250     0  0  1
3         1       1  35.0      1      0  53.1000     0  0  1
4         0       3  35.0      0      0   8.0500     1  0  1
5         0       3  24.0      0      0   8.4583     1  1  0
6         0       1  54.0      0      0  51.8625     1  0  1
7         0       3   2.0      3      1  21.0750     1  0  1
8         1       3  27.0      0      2  11.1333     0  0  1
9         1       2  14.0      1      0  30.0708     0  0  0

output csv 看起來像這樣

     Pclass    Age  SibSp  Parch      Fare  male    Q    S  Survived    SPred
0   NaN     NaN    NaN    NaN    NaN       NaN   NaN  NaN  NaN        0.0
1   NaN     NaN    NaN    NaN    NaN       NaN   NaN  NaN  NaN        0.0
2   NaN     NaN    NaN    NaN    NaN       NaN   NaN  NaN  NaN        1.0
3   NaN     NaN    NaN    NaN    NaN       NaN   NaN  NaN  NaN        1.0
4   NaN     NaN    NaN    NaN    NaN       NaN   NaN  NaN  NaN        0.0
5   NaN     NaN    NaN    NaN    NaN       NaN   NaN  NaN  NaN        0.0
6    1.0     54.00  0.0    0.0    51.8625   1.0   0.0  1.0  0.0       0.0
7    3.0     2.00   3.0    1.0    21.0750   1.0   0.0  1.0  0.0       0.0
8    3.0     27.00  0.0    2.0    11.1333   0.0   0.0  1.0  1.0       0.0
9    2.0     14.00  1.0    0.0    30.0708   0.0   0.0  0.0  1.0       1.0

問題是predser的索引與 X_test 和 y_test 的索引不一致。 解決它的一種方法是在concat之前更改索引,如:

predser.index = y_test.index.  # <== new line

trainnew=pd.concat([X_test,y_test,predser],axis=1)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM