I've downloaded dataframe and tried to create pd.Series from this DataFrame
data = pd.read_csv(filepath_or_buffer = "train.csv", index_col = 0)
data.columns
Index([u'qid1',u'qid2',u'question1',u'question2'], dtype = 'object')
Here is columns in DataFrame, qid1
is ID of question1
and qid2
is ID for question2
Also, there is no Nan
in my DataFrame:
data.question1.isnull().sum()
0
I want to create pandas.Series() from first questions with qid1
as index:
question1 = pd.Series(data.question1, index = data.qid1)
question1.isnull.sum()
68416
And now, there are 68416 Null values in my Series. Where is my mistake?
pass anonymous values so the Series
ctor doesn't try to align:
question1 = pd.Series(data.question1.values, index = data.qid1)
The problem here is that question1
column has it's own index so it's going to try to use this during the construction
Example:
In [12]:
df = pd.DataFrame({'a':np.arange(5), 'b':list('abcde')})
df
Out[12]:
a b
0 0 a
1 1 b
2 2 c
3 3 d
4 4 e
In [13]:
s = pd.Series(df['a'], index = df['b'])
s
Out[13]:
b
a NaN
b NaN
c NaN
d NaN
e NaN
Name: a, dtype: float64
In [14]:
s = pd.Series(df['a'].values, index = df['b'])
s
Out[14]:
b
a 0
b 1
c 2
d 3
e 4
dtype: int32
Effectively what happens here is that you're reindexing your existing column with the passed in new index, because there are no index values that match you get NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.