Pandas柱header

Question

i have multiple excel files that contains similar information that i want to bring in as list into python with pandas.我有多个 excel 文件，其中包含我想作为列表引入 python 和 pandas 的类似信息。 Every file contains a table that has the info that i want but this table in some files starts at row 5 or in other in row 10 or 23 and go on (because before this table there are some titles that change from one file to others) so this cannot be a constant BUT the headers of the table are the same, can i tell pandas "take all data under the row with "specific name""?每个文件都包含一个表，其中包含我想要的信息，但某些文件中的此表从第 5 行或第 10 行或第 23 行的其他文件开始，并且 go 打开（因为在此表之前有一些标题从一个文件更改为其他文件）所以这不能是一个常数，但表的标题是相同的，我可以告诉 pandas “获取具有“特定名称”的行下的所有数据吗？ or have i to tell every time the right index?还是让我每次都告诉正确的索引？

Thanks!谢谢！ Have a good work!祝你工作愉快！

Edit: To make it more clear this is how pandas read my dataframe (based on excel file)编辑：为了更清楚，这是 pandas 如何读取我的 dataframe （基于 excel 文件）

So as you can see in row 2 (in this example) there is the raw with "Draw, Back#,Horse,Rider...) and this is my "keyrow" so i want that my df starts under this row so i can use all below datas to make my folders but as i said this row is the same with all excels but in every excels is in different row.因此，正如您在第 2 行（在本例中）中看到的那样，有带有“Draw，Back#，Horse，Rider ...”的原始数据，这是我的“keyrow”，所以我希望我的 df 从这一行开始，所以我可以使用以下所有数据来制作我的文件夹，但正如我所说，这一行与所有 excel 相同，但在每个 excel 中都在不同的行中。

Answer 1

Since the multiple Excel files have the same column headers, you could create a filter to "take all data under the row with "specific name"".由于多个 Excel 文件具有相同的列标题，因此您可以创建一个过滤器以“获取具有“特定名称”的行下的所有数据”。

This can be done using df['col_name']=='specific name' which will return an array of True / False values.这可以使用df['col_name']=='specific name'来完成，这将返回一个True / False值数组。

filter = df['col_name']=='specific name'

After that, apply the filter (array of True / False values) to the dataframe, which will keep those values that are True only之后，将过滤器（ True / False值数组）应用于 dataframe，这将只保留那些为True值

df.loc[filter]

For example例如

df = pd.DataFrame({'col_name': ['nothing', 'specific name', 'specific name', 'blah', 'blah', 'blah'], 
              'col_2': ['blah blah', 'blah', 'blah blah', 'blah', 'blah', 'blah'] })

        col_name      col_2
0        nothing  blah blah
1  specific name       blah
2  specific name  blah blah
3           blah       blah
4           blah       blah
5           blah       blah

filter = df['col_name']=='specific name'
print(df.loc[filter])

        col_name      col_2
1  specific name       blah
2  specific name  blah blah

Answer 2

With the index, you can use the below code.使用索引，您可以使用以下代码。 If you have excel values with row 4 header and from row 5 the rest data如果您有 excel 值，第 4 行 header 和第 5 行 rest 数据

col_name = excel_df.iloc[3:4,:].values.tolist() col_name = excel_df.iloc[3:4,:].values.tolist()

df = excel_df.iloc[ 4:,: ] df = excel_df.iloc[4:,:]

df.columns = sum(col_name, []) df.columns = sum(col_name, [])

df df

Pandas柱header

问题描述

2 个解决方案

解决方案1
0 已采纳 2022-08-20 09:46:30

解决方案2
0 2022-08-20 12:22:42

Pandas柱header

问题描述

2 个解决方案

解决方案1 0 已采纳 2022-08-20 09:46:30

解决方案2 0 2022-08-20 12:22:42

解决方案1
0 已采纳 2022-08-20 09:46:30

解决方案2
0 2022-08-20 12:22:42