简体   繁体   English

MultiIndex pandas dataframe 并使用 gspread-pandas 写入 Google 表格

[英]MultiIndex pandas dataframe and writing to Google Sheets using gspread-pandas

Starting with the following dictionary:从以下字典开始:

test_dict = {'header1_1': {'header2_1': {'header3_1': {'header4_1': ['322.5', 330.0, -0.28],
    'header4_2': ['322.5', 332.5, -0.26]},
   'header3_2': {'header4_1': ['285.0', 277.5, -0.09],
    'header4_2': ['287.5', 277.5, -0.12]}},
  'header2_2': {'header3_1': {'header4_1': ['345.0', 357.5, -0.14],
    'header4_2': ['345.0', 362.5, -0.14]},
   'header3_2': {'header4_1': ['257.5', 245.0, -0.1],
    'header4_2': ['257.5', 240.0, -0.08]}}}}

I want the headers in the index, so I reform the dictionary:我想要索引中的标题,所以我修改了字典:

reformed_dict = {}
for outerKey, innerDict in test_dict.items():
    for innerKey, innerDict2 in innerDict.items():
        for innerKey2, innerDict3 in innerDict2.items():
            for innerKey3, values in innerDict3.items():
                reformed_dict[(outerKey,
                        innerKey, innerKey2, innerKey3)] = values

And assign column names to the headers:并将列名分配给标题:

keys = reformed_dict.keys()
values = reformed_dict.values()
index = pd.MultiIndex.from_tuples(keys, names=["H1", "H2", "H3", "H4"])
df = pd.DataFrame(data=values, index=index)

That gets to a dataframe that looks like this:得到一个 dataframe,看起来像这样: 在此处输入图像描述

Issue #1 [*** this has been answered by @AzharKhan, so feel free to skip ahead to Issue #2 ***]: To assign names to the data columns, I tried: Issue #1 [*** @AzharKhan 已经回答了这个问题,所以请随意跳到 Issue #2 ***]:要为数据列分配名称,我尝试过:

df.columns = ['col 1', 'col 2' 'col 3']

and got error: "ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements"并得到错误:“ValueError:长度不匹配:预期轴有 3 个元素,新值有 2 个元素”

Then per a suggestion, I tried:然后根据建议,我尝试了:

df = df.rename(columns={'0': 'Col1', '1': 'Col2', '2': 'Col3'})

This does not generate an error, but the dataframe looks exactly the same as before, with 0, 1, 2 as the data column headers.这不会产生错误,但 dataframe 看起来和以前完全一样,数据列标题为 0、1、2。

How can I assign names to these data columns?如何为这些数据列分配名称? I assume 0, 1, 2 are column indices, not column names.我假设 0、1、2 是列索引,而不是列名。

Issue #2 : When I write this dataframe to Google Sheets using gspread-pandas :问题 #2 :当我使用gspread-pandas将此 dataframe 写入 Google 表格时:

s.open_sheet('test')
Spread.df_to_sheet(s, df, index=True, headers=True, start='A8', replace=False) 

The result is this:结果是这样的: 在此处输入图像描述

What I would like is this:我想要的是: 在此处输入图像描述

This is how the dataframe appears in Jupyter notebook screenshot earlier, so it seems the process of writing to spreadsheet is filling in the empty row headers, which makes the table harder to read at a glance.这是之前 Jupyter notebook 截图中的 dataframe 是这样出现的,所以看起来写入电子表格的过程似乎是在填充空行标题,这使得表格很难一目了然。

How can I get the output to spreadsheet to omit the row headers until they have changed, and thus get the second spreadsheet output?我怎样才能得到 output 到电子表格以省略行标题,直到它们发生变化,从而得到第二个电子表格 output?

Issue #1问题 #1

Your columns are numbers (not strings).您的列是数字(不是字符串)。 You can see it by:您可以通过以下方式查看:

print(df.columns)

[Out]:
RangeIndex(start=0, stop=3, step=1)

Use numbers in df.rename() as follows:df.rename()中使用数字,如下所示:

df = df.rename(columns={0: 'Col1', 1: 'Col2', 2: 'Col3'})
print(df.columns)
print(df)

[Out]:
Index(['Col1', 'Col2', 'Col3'], dtype='object')

                                          Col1   Col2  Col3
H1        H2        H3        H4                           
header1_1 header2_1 header3_1 header4_1  322.5  330.0 -0.28
                              header4_2  322.5  332.5 -0.26
                    header3_2 header4_1  285.0  277.5 -0.09
                              header4_2  287.5  277.5 -0.12
          header2_2 header3_1 header4_1  345.0  357.5 -0.14
                              header4_2  345.0  362.5 -0.14
                    header3_2 header4_1  257.5  245.0 -0.10
                              header4_2  257.5  240.0 -0.08

Or if you want to generalise it rather than hard coding then use:或者,如果您想概括它而不是硬编码,请使用:

df = df.rename(columns={i:f"Col{i+1}" for i in df.columns})

I am not sure about your issue #2.我不确定你的问题#2。 You may want to carve it out into a separate question to get attention.你可能想把它分成一个单独的问题来引起注意。

Here is a way to handle issue #1 by using pd.json_normalize()这是使用pd.json_normalize()处理问题 #1 的方法

df = pd.json_normalize(test_dict,max_level=3).stack().droplevel(0)
idx = df.index.map(lambda x: tuple(x.split('.'))).rename(['H1','H2','H3','H4'])
df = pd.DataFrame(df.tolist(),index = idx,columns = ['col1','col2','col3'])

Output: Output:

                                          col1   col2  col3
H1        H2        H3        H4                           
header1_1 header2_1 header3_1 header4_1  322.5  330.0 -0.28
                              header4_2  322.5  332.5 -0.26
                    header3_2 header4_1  285.0  277.5 -0.09
                              header4_2  287.5  277.5 -0.12
          header2_2 header3_1 header4_1  345.0  357.5 -0.14
                              header4_2  345.0  362.5 -0.14
                    header3_2 header4_1  257.5  245.0 -0.10
                              header4_2  257.5  240.0 -0.08

Issue #2 is tricky because Jupyter notebook displays the index with the "blank" values, but if you were to do df.index , it would show that all the data is actually there.问题 #2 很棘手,因为 Jupyter notebook 显示带有“空白”值的索引,但如果您要执行df.index ,它会显示所有数据实际上都在那里。 Its just a visual choice used by Jupyter notebooks.它只是 Jupyter 笔记本使用的视觉选择。

In order to achieve this, you can check for value changes and join newly created df.为了实现这一点,您可以检查值的变化并加入新创建的 df。

idx_df = df.index.to_frame().reset_index(drop=True)

df = idx_df.where(idx_df.ne(idx_df.shift())).join(df.reset_index(drop=True))

The creator of gspread-pandas has added the functionality to merge indexes when writing a dataframe to Google Sheets. gspread-pandas 的创建者添加了在将 dataframe 写入 Google 表格时合并索引的功能。 It's not yet in general release version of gspread-pandas, but can be found here: https://github.com/aiguofer/gspread-pandas/pull/92它还没有在 gspread-pandas 的一般发布版本中,但可以在这里找到: https://github.com/aiguofer/gspread-pandas/pull/92

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM