[英]How to use string as column name in pandas dataframe
I've got an excel workbook that I am reading data from and and doing things with.我有一个 excel 工作簿,我正在从中读取数据并使用它来做事。 In the excel workbook, some of the column headers are numbers and I don't know how to use them in pandas.
在 excel 工作簿中,一些列标题是数字,我不知道如何在 pandas 中使用它们。 I am also not allowed to change the column titles in excel (for the purposes of this project).
我也不允许更改 excel 中的列标题(出于本项目的目的)。
In this case, the column headers are all the same (ex: 2008, 2008, and 2008) and are all numbers.在这种情况下,列标题都是相同的(例如:2008、2008 和 2008)并且都是数字。 This makes sense in the context of my project but is confusing to pandas and to me.
这在我的项目环境中是有道理的,但让 pandas 和我感到困惑。 They are distinguished because the row above them in the excel workbook has more info.
它们之所以与众不同,是因为 excel 工作簿中它们上方的行包含更多信息。
filename = 'myfile.xlsx'
data = pd.read_excel(myfile, skiprows=8)
print("Column Headings")
print(data.columns)
Results of printing the column headers (shortened list):打印列标题的结果(缩短列表):
Index([2008, '2008.1', '2008.2'], dtype='object')
Now I need to use these column names to get at the data in those columns...现在我需要使用这些列名来获取这些列中的数据......
provider_name = 'example_name'
subset_by_provider = data.loc[data['Provider'] == provider_name]
#the error is here. 2008 is the column name
data_2008 = subset_by_provider.2008.tolist()
As I indicated above, the error is in the last line of code.正如我上面指出的,错误出现在最后一行代码中。 I am reading the data into a list.
我正在将数据读入列表。 2008 (as an integer) and '2008.1' are names of the columns in my excel sheet.
2008(作为整数)和 '2008.1' 是我的 excel 表中的列的名称。 But I get a syntax error.
但我得到一个语法错误。
#Doesn't work
data_2008 = subset_by_provider.2008.tolist()
#Doesn't work
data_2008 = subset_by_provider.'2008.1'.tolist()
#Does work
data_2008 = subset_by_provider.i2008.tolist()
In the 2nd line, I changed the column name in the excel sheet from 2008 to i2008, just to prove a point.在第 2 行,我将 excel 表中的列名从 2008 更改为 i2008,只是为了证明一点。 However, in practice, I am not allowed to do this.
但是,在实践中,我不允许这样做。
How can I read the column name 2008 or '2008.1'?如何读取列名 2008 或“2008.1”?
As noted in the comments above.如上面评论中所述。 The solution:
解决方案:
data_2008 = subset_by_provider[2008].tolist()
or
data_2008 = subset_by_provider['2008.1'].tolist()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.