简体   繁体   中英

Reading Quip Spreadsheet with quip-api and pandas

I have started exploring Quip API.

I have created a spreadsheet in Quip with the below details:

  1. Added title of the spreadsheet
  2. Added below data in the spreadsheet:
id name
1 harry
2 hermione
3 ron

And here is how I am trying to read from Quip:

import quip
import pandas as pd
import numpy as np
import html5lib

client = quip.QuipClient(token, base_url = baseurl)
rawdictionary = client.get_thread(thread_id)

dfs=pd.read_html(rawdictionary['html'])
raw_df = dfs[0]
raw_df.drop(raw_df.columns[[0]], axis = 1, inplace = True) 
#raw_df.dropna(axis=0,inplace=True)
print(raw_df.replace(r'^\s+$', np.nan, regex=True))

I tried to drop rows with nan objects and also tried to replace blank strings with nan. However, I'm still seeing that these null rows and columns are appearing in the dataframe, for eg:

         A         B  C  D  E  F  G  H  I  J  K  L  M  N  O  P
0   id      name  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
1    1    harry  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
2    2  hermione  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
3    3  ron  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
4    ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
5    ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
6    ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
7    ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
8    ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
9    ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
10   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
11   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
12   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
13   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
14   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
15   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
16   ​         ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​
17   ​     

​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​  ​

​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​

Questions

  1. What is the best possible way of reading Quip spreadsheet via Python?
  2. How to clean the extra rows and columns and only process the rows with valid records and headers as id and name in pandas dataframe?
  3. After adding raw_df.dropna(axis=0,inplace=True) when I'm running print(raw_df) , I'm getting None . Why?

Quip automatically pulls in a number of extra blank columns and rows with \​ unicode characters.

This is how I've resolved this:

import quip
import pandas as pd
import numpy as np
import html5lib

client = quip.QuipClient(token, base_url = baseurl)
rawdictionary = client.get_thread(thread_id)

dfs=pd.read_html(rawdictionary['html'])
raw_df = dfs[0]

raw_df.columns=raw_df.iloc[0] #Make first row as column header
raw_df=raw_df[1:] #After the above step, the 1st two rows become duplicate. Delete the 1st row.
raw_df=raw_df[attribs]
cleaned_df = raw_df.replace(np.nan, 'N/A')
cleaned_df = cleaned_df.replace('\u200b', np.nan) 
cleaned_df.dropna(axis=0,how='any',inplace=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM