简体   繁体   English

使用 quip-api 和 pandas 阅读 Quip 电子表格

[英]Reading Quip Spreadsheet with quip-api and pandas

I have started exploring Quip API.我已经开始探索 Quip API。

I have created a spreadsheet in Quip with the below details:我在 Quip 中创建了一个电子表格,其中包含以下详细信息:

  1. Added title of the spreadsheet添加了电子表格的标题
  2. Added below data in the spreadsheet:在电子表格中添加了以下数据:
id ID name名称
1 1 harry哈利
2 2 hermione赫敏
3 3 ron

And here is how I am trying to read from Quip:这是我尝试从 Quip 中读取的方式:

import quip
import pandas as pd
import numpy as np
import html5lib

client = quip.QuipClient(token, base_url = baseurl)
rawdictionary = client.get_thread(thread_id)

dfs=pd.read_html(rawdictionary['html'])
raw_df = dfs[0]
raw_df.drop(raw_df.columns[[0]], axis = 1, inplace = True) 
#raw_df.dropna(axis=0,inplace=True)
print(raw_df.replace(r'^\s+$', np.nan, regex=True))

I tried to drop rows with nan objects and also tried to replace blank strings with nan.我尝试用 nan 对象删除行,并尝试用 nan 替换空白字符串。 However, I'm still seeing that these null rows and columns are appearing in the dataframe, for eg:但是,我仍然看到这些空行和列出现在数据框中,例如:

         A         B  C  D  E  F  G  H  I  J  K  L  M  N  O  P
0   id      name  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
1    1    harry  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
2    2  hermione  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
3    3  ron  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
4    ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
5    ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
6    ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
7    ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
8    ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
9    ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
10   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
11   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
12   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
13   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
14   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
15   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
16   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
17   ​     

​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​

​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​

Questions问题

  1. What is the best possible way of reading Quip spreadsheet via Python?通过 Python 阅读 Quip 电子表格的最佳方式是什么?
  2. How to clean the extra rows and columns and only process the rows with valid records and headers as id and name in pandas dataframe?如何清理多余的行和列,并仅处理在 Pandas 数据框中具有有效记录和标题作为idname的行?
  3. After adding raw_df.dropna(axis=0,inplace=True) when I'm running print(raw_df) , I'm getting None .在运行print(raw_df)时添加raw_df.dropna(axis=0,inplace=True) print(raw_df) ,我得到None Why?为什么?

Quip automatically pulls in a number of extra blank columns and rows with \​ unicode characters. Quip 会自动使用 \​ Unicode 字符引入许多额外的空白列和行。

This is how I've resolved this:这就是我解决这个问题的方法:

import quip
import pandas as pd
import numpy as np
import html5lib

client = quip.QuipClient(token, base_url = baseurl)
rawdictionary = client.get_thread(thread_id)

dfs=pd.read_html(rawdictionary['html'])
raw_df = dfs[0]

raw_df.columns=raw_df.iloc[0] #Make first row as column header
raw_df=raw_df[1:] #After the above step, the 1st two rows become duplicate. Delete the 1st row.
raw_df=raw_df[attribs]
cleaned_df = raw_df.replace(np.nan, 'N/A')
cleaned_df = cleaned_df.replace('\u200b', np.nan) 
cleaned_df.dropna(axis=0,how='any',inplace=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM