简体   繁体   English

在 Pandas 中,如何使用变量名来表示行索引以获取可用作标题行的字符串?

[英]in Pandas, how do I use a variable name to represent a row index to obtain a string that can be used as a header row?

I'm trying to clean an excel file that has some random formatting.我正在尝试清理具有一些随机格式的 excel 文件。 The file has blank rows at the top, with the actual column headings at row 8. I've gotten rid of the blank rows, and now want to use the row 8 string as the true column headings in the dataframe.该文件的顶部有空白行,实际列标题位于第 8 行。我已经去掉了空白行,现在想使用第 8 行字符串作为数据框中的真正列标题。

I use this code to get the position of the column headings by searching for the string 'Destination' in the whole dataframe, and then take the location of the True value in the Boolean mask to get the list for renaming the column headers:我使用此代码通过在整个数据框中搜索字符串“Destination”来获取列标题的位置,然后在布尔掩码中获取 True 值的位置以获取用于重命名列标题的列表:

boolmsk=df.apply(lambda row: row.astype(str).str.contains('Destination').any(), axis=1)
print(boolmsk)
hdrindex=boolmsk.index[boolmsk == True].tolist()
print(hdrindex)
hdrstr=df.loc[7]
print(hdrstr)
df2=df.rename(columns=hdrstr)

However when I try to use hdrindex as a variable, I get errors when the second dataframe is created (ie when I try to use hdrstr to replace column headings.)但是,当我尝试使用 hdrindex 作为变量时,在创建第二个数据帧时出现错误(即当我尝试使用 hdrstr 替换列标题时)。

boolmsk=df.apply(lambda row: row.astype(str).str.contains('Destination').any(), axis=1)
print(boolmsk)
hdrindex=boolmsk.index[boolmsk == True].tolist()
print(hdrindex)
hdrstr=df.loc[hdrindex]
print(hdrstr)
df2=df.rename(columns=hdrstr)

How do I use a variable to specify an index, so that the resulting list can be used as column headings?如何使用变量指定索引,以便将结果列表用作列标题?

I assume your indicator of actual header rows in dataframe is string "destination".我假设您在数据框中实际标题行的指标是字符串“目的地”。 Lets find where it is:让我们找出它在哪里:

start_tag = df.eq("destination").any(1)

We'll keep the number of the index of first occurrence of word "destination" for further use:我们将保留单词“destination”第一次出现的索引号以供进一步使用:

start_row = df.loc[start_tag].index.min()

Using index number we will get list of values in the "header" row:使用索引号,我们将获得“标题”行中的值列表:

new_col_names = df.iloc[start_row].values.tolist()

And here we can assign new column names to dataframe:在这里,我们可以为数据框分配新的列名:

df.columns = new_col_names

From here you can play with new dataframe, actual column names and proper indexing:从这里你可以使用新的数据框、实际的列名和正确的索引:

df2 = df.iloc[start_row+1:].reset_index(drop=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM