[英]change a specific column into row names in Pandas
我看到这个已被要求在这个网站在这里 。 从那个帖子借来的想法,但在我的情况下不起作用。 我正在从Excel工作表中读取一些数据,并尝试将其转换为具有列和行索引的Pandas数据框。 第一行是Excel的年份标头,我通过做df.columns=df.iloc[0]
尝试将其作为列标头。
所以当我运行df.columns
,它会重新运行: Index([None, 2014.0, 2015.0, 2016.0, 2017.0, 2018.0], dtype='object', name=0)
我现在的问题是转换以Month名称作为行名称的列。 我努力了
df.set_index('None',inplace=True)
但这会返回KeyError: 'None'
编辑:在此处添加示例数据
更新:我通过df.columns = ['Month', 2014, 2015, 2016, 2017, 2018]
和df.drop(df.index[0])
对于我来说,工作不错,添加2个参数index_col=[0]
用于将第一列转换为index
并usecols
带有range
usecols
来选择所有未Unnamed
列的列:
df = pd.read_excel('sample.xlsx', usecols=range(1, 100))
print (df)
Unnamed: 0 2014 2015 2016 2017 2018
0 Jan 42.9 47.2 43.000000 43.00 48.98
1 Feb 36.6 45.0 40.300000 43.00 45.92
2 Mar 37.8 42.8 44.805668 43.00 43.00
3 Apr 40.9 44.4 43.900000 41.30 44.46
4 May 40.5 47.1 44.200000 41.97 42.31
5 Jun 41.8 46.9 44.600000 45.70 NaN
6 Jul 40.5 45.0 43.500000 45.49 NaN
7 Aug 44.3 45.0 43.800000 44.59 NaN
8 Sep 43.8 47.3 47.600000 47.25 NaN
9 Oct 44.2 47.0 47.600000 50.08 NaN
10 Nov 44.2 43.7 50.078663 50.93 NaN
11 Dec 48.8 45.5 46.500000 48.37 NaN
df = pd.read_excel('sample.xlsx', index_col=[0], usecols = range(1, 100))
print (df)
2014 2015 2016 2017 2018
Jan 42.9 47.2 43.000000 43.00 48.98
Feb 36.6 45.0 40.300000 43.00 45.92
Mar 37.8 42.8 44.805668 43.00 43.00
Apr 40.9 44.4 43.900000 41.30 44.46
May 40.5 47.1 44.200000 41.97 42.31
Jun 41.8 46.9 44.600000 45.70 NaN
Jul 40.5 45.0 43.500000 45.49 NaN
Aug 44.3 45.0 43.800000 44.59 NaN
Sep 43.8 47.3 47.600000 47.25 NaN
Oct 44.2 47.0 47.600000 50.08 NaN
Nov 44.2 43.7 50.078663 50.93 NaN
Dec 48.8 45.5 46.500000 48.37 NaN
或选择第二列作为索引并删除Unnamed: 0
列Unnamed: 0
:
df = pd.read_excel('sample.xlsx', index_col=[1])
print (df)
Unnamed: 0 2014 2015 2016 2017 2018
Jan NaN 42.9 47.2 43.000000 43.00 48.98
Feb NaN 36.6 45.0 40.300000 43.00 45.92
Mar NaN 37.8 42.8 44.805668 43.00 43.00
Apr NaN 40.9 44.4 43.900000 41.30 44.46
May NaN 40.5 47.1 44.200000 41.97 42.31
Jun NaN 41.8 46.9 44.600000 45.70 NaN
Jul NaN 40.5 45.0 43.500000 45.49 NaN
Aug NaN 44.3 45.0 43.800000 44.59 NaN
Sep NaN 43.8 47.3 47.600000 47.25 NaN
Oct NaN 44.2 47.0 47.600000 50.08 NaN
Nov NaN 44.2 43.7 50.078663 50.93 NaN
Dec NaN 48.8 45.5 46.500000 48.37 NaN
df = pd.read_excel('sample.xlsx', index_col=[1]).drop('Unnamed: 0', axis=1)
print (df)
2014 2015 2016 2017 2018
Jan 42.9 47.2 43.000000 43.00 48.98
Feb 36.6 45.0 40.300000 43.00 45.92
Mar 37.8 42.8 44.805668 43.00 43.00
Apr 40.9 44.4 43.900000 41.30 44.46
May 40.5 47.1 44.200000 41.97 42.31
Jun 41.8 46.9 44.600000 45.70 NaN
Jul 40.5 45.0 43.500000 45.49 NaN
Aug 44.3 45.0 43.800000 44.59 NaN
Sep 43.8 47.3 47.600000 47.25 NaN
Oct 44.2 47.0 47.600000 50.08 NaN
Nov 44.2 43.7 50.078663 50.93 NaN
Dec 48.8 45.5 46.500000 48.37 NaN
您可以通过以下方式重命名列:
df.columns = ['None',2014.0,2015.0,2016.0,2017.0,2018.0]
现在您的命令应该可以了
试试这个
df.set_index(df.None)
列名设置为' None '时,您不能将其设置为索引,因此要将该列设置为索引,请首先重命名该列。
df.columns.values[0]='First'
然后将其设置为-的索引:
df.set_index('First')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.