[英]Pandas dataframe and to_numeric: select column by index
The question is probaly extremely dumb, but i hurt my brain figuring out what to do 问题很可能是愚蠢的,但我弄不清自己该怎么办
There is a pd.dataframe
with N columns. 有一个带有N列的
pd.dataframe
。 I need to select some columns, referring by index of a column, then convert all values to numeric and rewrite that column in my dataframe
我需要选择一些列,并按列索引进行引用,然后将所有值转换为数字,然后将其重写为
dataframe
I've done it by column name reference (like df['a'] = pd.to_numeric(df['a'])
but stuck with indices (like df[1] = pd.to_numeric(df[1])
我已经通过列名引用完成了此操作(例如
df['a'] = pd.to_numeric(df['a'])
但卡住了索引(例如df[1] = pd.to_numeric(df[1])
What is the right way in this situation to dataframe
column referencing? 在这种情况下,什么是正确的
dataframe
列引用方法? (python 2.7) (python 2.7)
You can use ix
for selecting columns and then apply
to_numeric
: 您可以使用
ix
来选择列,然后apply
to_numeric
:
import pandas as pd
df = pd.DataFrame({1:['1','2','3'],
2:[4,5,6],
3:[7,8,9],
4:['1','3','5'],
5:[5,3,6],
6:['7','4','3']})
print (df)
1 2 3 4 5 6
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 5 6 3
print (df.dtypes)
1 object
2 int64
3 int64
4 object
5 int64
6 object
dtype: object
print (df.columns)
Int64Index([1, 2, 3, 4, 5, 6], dtype='int64')
cols = [1,4,6]
df.ix[:, cols] = df.ix[:, cols].apply(pd.to_numeric)
print (df)
1 2 3 4 5 6
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 5 6 3
print (df.dtypes)
1 int64
2 int64
3 int64
4 int64
5 int64
6 int64
dtype: object
If columns are strings
, not int
(but it looks like int
) add ''
to numbers in list
cols
: 如果列是
strings
,则不是int
(但看起来像int
)在list
cols
数字上添加''
:
import pandas as pd
df = pd.DataFrame({'1':['1','2','3'],
'2':[4,5,6],
'3':[7,8,9],
'4':['1','3','5'],
'5':[5,3,6],
'6':['7','4','3']})
#print (df)
#print (df.dtypes)
print (df.columns)
Index(['1', '2', '3', '4', '5', '6'], dtype='object')
#add `''`
cols = ['1','4','6']
#1. ix: supports mixed integer and label based access
df.ix[:, cols] = df.ix[:, cols].apply(pd.to_numeric)
#2. loc: only label based access
# df.loc[:, cols] = df.loc[:, cols].apply(pd.to_numeric)
#3. iloc: for index based access
# cols = [i for i in range(len(df.columns))]
# df.iloc[:, cols].apply(pd.to_numeric)
print (df)
1 2 3 4 5 6
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 5 6 3
print (df.dtypes)
1 int64
2 int64
3 int64
4 int64
5 int64
6 int64
dtype: object
You may want to check the following post out Is .ix() always better than .loc() and .iloc() since it is faster and supports integer and label access? 您可能需要检查以下内容:.ix()总是比.loc()和.iloc()更好,因为它速度更快并且支持整数和标签访问吗?
A must [Different choice of Indexing] ( http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing ) 必须[索引的不同选择]( http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.