使用 quip-api 和 pandas 閱讀 Quip 電子表格

Question

我已經開始探索 Quip API。

我在 Quip 中創建了一個電子表格，其中包含以下詳細信息：

添加了電子表格的標題
在電子表格中添加了以下數據：

ID	名稱
1	哈利
2	赫敏
3	朗

這是我嘗試從 Quip 中讀取的方式：

import quip
import pandas as pd
import numpy as np
import html5lib

client = quip.QuipClient(token, base_url = baseurl)
rawdictionary = client.get_thread(thread_id)

dfs=pd.read_html(rawdictionary['html'])
raw_df = dfs[0]
raw_df.drop(raw_df.columns[[0]], axis = 1, inplace = True) 
#raw_df.dropna(axis=0,inplace=True)
print(raw_df.replace(r'^\s+$', np.nan, regex=True))

我嘗試用 nan 對象刪除行，並嘗試用 nan 替換空白字符串。 但是，我仍然看到這些空行和列出現在數據框中，例如：

         A         B  C  D  E  F  G  H  I  J  K  L  M  N  O  P
0   id      name                            
1    1    harry                            
2    2  hermione                            
3    3  ron                            
4                                         
5                                         
6                                         
7                                         
8                                         
9                                         
10                                        
11                                        
12                                        
13                                        
14                                        
15                                        
16                                        
17

問題

通過 Python 閱讀 Quip 電子表格的最佳方式是什么？
如何清理多余的行和列，並僅處理在 Pandas 數據框中具有有效記錄和標題作為id和name的行？
在運行print(raw_df)時添加raw_df.dropna(axis=0,inplace=True) print(raw_df) ，我得到None 。 為什么？

Answer 1

Quip 會自動使用 \ Unicode 字符引入許多額外的空白列和行。

這就是我解決這個問題的方法：

import quip
import pandas as pd
import numpy as np
import html5lib

client = quip.QuipClient(token, base_url = baseurl)
rawdictionary = client.get_thread(thread_id)

dfs=pd.read_html(rawdictionary['html'])
raw_df = dfs[0]

raw_df.columns=raw_df.iloc[0] #Make first row as column header
raw_df=raw_df[1:] #After the above step, the 1st two rows become duplicate. Delete the 1st row.
raw_df=raw_df[attribs]
cleaned_df = raw_df.replace(np.nan, 'N/A')
cleaned_df = cleaned_df.replace('\u200b', np.nan) 
cleaned_df.dropna(axis=0,how='any',inplace=True)

使用 quip-api 和 pandas 閱讀 Quip 電子表格

問題描述

1 個解決方案

解決方案1
2 已采納 2021-02-08 15:08:33

使用 quip-api 和 pandas 閱讀 Quip 電子表格

問題描述

1 個解決方案

解決方案1 2 已采納 2021-02-08 15:08:33

解決方案1
2 已采納 2021-02-08 15:08:33