[英]Pandas column header
i have multiple excel files that contains similar information that i want to bring in as list into python with pandas.我有多个 excel 文件,其中包含我想作为列表引入 python 和 pandas 的类似信息。 Every file contains a table that has the info that i want but this table in some files starts at row 5 or in other in row 10 or 23 and go on (because before this table there are some titles that change from one file to others) so this cannot be a constant BUT the headers of the table are the same, can i tell pandas "take all data under the row with "specific name""?
每个文件都包含一个表,其中包含我想要的信息,但某些文件中的此表从第 5 行或第 10 行或第 23 行的其他文件开始,并且 go 打开(因为在此表之前有一些标题从一个文件更改为其他文件)所以这不能是一个常数,但表的标题是相同的,我可以告诉 pandas “获取具有“特定名称”的行下的所有数据吗? or have i to tell every time the right index?
还是让我每次都告诉正确的索引?
Thanks!谢谢! Have a good work!
祝你工作愉快!
Edit: To make it more clear this is how pandas read my dataframe (based on excel file)编辑:为了更清楚,这是 pandas 如何读取我的 dataframe (基于 excel 文件)
So as you can see in row 2 (in this example) there is the raw with "Draw, Back#,Horse,Rider...) and this is my "keyrow" so i want that my df starts under this row so i can use all below datas to make my folders but as i said this row is the same with all excels but in every excels is in different row.因此,正如您在第 2 行(在本例中)中看到的那样,有带有“Draw,Back#,Horse,Rider ...”的原始数据,这是我的“keyrow”,所以我希望我的 df 从这一行开始,所以我可以使用以下所有数据来制作我的文件夹,但正如我所说,这一行与所有 excel 相同,但在每个 excel 中都在不同的行中。
Since the multiple Excel files have the same column headers, you could create a filter to "take all data under the row with "specific name"".由于多个 Excel 文件具有相同的列标题,因此您可以创建一个过滤器以“获取具有“特定名称”的行下的所有数据”。
This can be done using df['col_name']=='specific name'
which will return an array of True
/ False
values.这可以使用
df['col_name']=='specific name'
来完成,这将返回一个True
/ False
值数组。
filter = df['col_name']=='specific name'
After that, apply the filter (array of True
/ False
values) to the dataframe, which will keep those values that are True
only之后,将过滤器(
True
/ False
值数组)应用于 dataframe,这将只保留那些为True
值
df.loc[filter]
For example例如
df = pd.DataFrame({'col_name': ['nothing', 'specific name', 'specific name', 'blah', 'blah', 'blah'],
'col_2': ['blah blah', 'blah', 'blah blah', 'blah', 'blah', 'blah'] })
col_name col_2
0 nothing blah blah
1 specific name blah
2 specific name blah blah
3 blah blah
4 blah blah
5 blah blah
filter = df['col_name']=='specific name'
print(df.loc[filter])
col_name col_2
1 specific name blah
2 specific name blah blah
With the index, you can use the below code.使用索引,您可以使用以下代码。 If you have excel values with row 4 header and from row 5 the rest data
如果您有 excel 值,第 4 行 header 和第 5 行 rest 数据
col_name = excel_df.iloc[3:4,:].values.tolist() col_name = excel_df.iloc[3:4,:].values.tolist()
df = excel_df.iloc[ 4:,: ] df = excel_df.iloc[4:,:]
df.columns = sum(col_name, []) df.columns = sum(col_name, [])
df df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.