简体   繁体   中英

How to read two tables from single excel sheet using python?

please refer this image: Two tables in single excel sheet

I need dynamic python code which can read two tables from single excel sheet without specifying the header position. The number of columns and number of rows can change with time. Please help!

It's a little hard for me personally to write the actual code for something like this without the excel file itself, but I can definitely tell you the strategy/steps for dealing with it. As you know, pandas treats it as a single DataFrame. That means you should too. The trick is to not get fooled into thinking that this is truly structured data and works with identical logic to a structured table. Think of what you're doing to be less similar to cleaning structured data than it is telling a computer how to measure and cut a piece of paper. Instead of approaching it as two tables, think of it as a large DataFrame where rows fall into three categories:

  1. Rows with nothing
  2. Rows that you want to end up in the first table
  3. Rows that you want to end up in the second table

The first thing to do is try and create a column that will sort the rows into those three groups. Looking at it, I would rely on the cells that say "information about table (1/2)". You can create a column that says 1 if the first column has "table 1", 2 if it has "table 2" and will be null otherwise. You may be worried about all of the actual table values having null values for this new column. Don't be yet.

Now, with the new column, you want to use the .ffill() method on the column. This will take all of the non-null values in the column and propagate them downwards to all available null values. At this point, all rows of the first table will have 1 for the column and the rows for the second table will have 2 . We have the first major step out of the way.

Now, the first column should still have null values because you haven't done anything with it. Fortunately, the null values here only exist where the entire row is empty. Drop all rows with null values for the first column. At last, you should now be able to create two new DataFrames using Boolean masking.

eg: df1 = df.loc[df["filter"]==1].copy(deep=True)

You will still have the columns and headers to handle/clean up how you'd like, but at this point, it should be much easier for you to clean those up from a single table rather than two tables smashed together within a DataFrame.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM