简体   繁体   中英

Pandas: Parse Excel spreadsheet with merged cells and blank values

My question is similar to this one . I have a spreadsheet with some merged cells, but the column with merged cells also has empty cells, eg:

Day     Sample  CD4     CD8
----------------------------
Day 1   8311    17.3    6.44
        --------------------
        8312    13.6    3.50
        --------------------
        8321    19.8    5.88
        --------------------
        8322    13.5    4.09
----------------------------
Day 2   8311    16.0    4.92
        --------------------
        8312    5.67    2.28
        --------------------
        8321    13.0    4.34
        --------------------
        8322    10.6    1.95
----------------------------
        8323    16.0    4.92
----------------------------
        8324    5.67    2.28
----------------------------
        8325    13.0    4.34

How can I parse this into a Pandas DataFrame? I understand that the fillna(method='ffill') method will not solve my issue, since it will replace the actually missing values with something else. I want to get a DataFrame like this:

Day     Sample  CD4     CD8
----------------------------
Day 1   8311    17.3    6.44
----------------------------
Day 1   8312    13.6    3.50
----------------------------
Day 1   8321    19.8    5.88
----------------------------
Day 1   8322    13.5    4.09
----------------------------
Day 2   8311    16.0    4.92
----------------------------
Day 2   8312    5.67    2.28
----------------------------
Day 2   8321    13.0    4.34
----------------------------
Day 2   8322    10.6    1.95
----------------------------
NA      8323    16.0    4.92
----------------------------
NA      8324    5.67    2.28
----------------------------
NA      8325    13.0    4.34

Something like this should work assuming you know the starting row of your excel file (or come up with a better way to check that)

import pandas as pd
import numpy as np
import openpyxl
def test():
    filepath = "C:\\Users\\me\\Desktop\\SO nonsense\\PandasMergeCellTest.xlsx"
    df = pd.read_excel(filepath)
    wb = openpyxl.load_workbook(filepath)
    sheet = wb["Sheet1"]
    df["Row"] = np.arange(len(df)) + 2 #My headers were row 1 so adding 2 to get the row numbers
    df["Merged"] = df.apply(lambda x: checkMerged(x, sheet), axis=1)
    df["Day"] = np.where(df["Merged"] == True, df["Day"].ffill(), np.nan)
    df = df.drop(["Row", "Merged"], 1)
    print(df)

def checkMerged(x, sheet):
    cell = sheet.cell(x["Row"], 1)
    for mergedcell in sheet.merged_cells.ranges:
        if(cell.coordinate in mergedcell):
            return True

test()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM