how to store bytes like b'PK\x03\x04\x14\x00\x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00 to dataframe or csv in python

Question

我正在請求 URL 並獲得以字節為單位的返回。 我想將其存儲在數據框中，然后存儲到 CSV。

#Get Data from the CSV 
url = "someURL" 
req = requests.get(URL)
url_content = req.content
csv_file = open('test.txt', 'wb')
print(type(url_content))
print(url_content)
csv_file.write(url_content)
csv_file.close()

我嘗試了很多方法，但找不到解決方案。 上面的代碼將 output 存儲在 CSV 中，但出現以下錯誤。 我的最終目標是將其存儲在 CSV 中，然后將其發送到谷歌雲。 並創建一個谷歌大查詢表。

Output：

<class 'bytes'>

b'PK\x03\x04\x14\x00\x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x13\x00\x00\ x00[內容類型].xml\xb5S\xcbn\xc20\x10\xfc\x95\xc8\xd7 6\xf4PU\x15\x81C\x1f\xc7\x16\xa9\xf4\x03\{\x93X\xf8%\xaf \xa1\xf0\xf7]\x078\x94R\x89\nq\xf2cfgfW\xf6d\xb6q\xb6ZCB\x13|\xc3\xc6|\xc4 \xf0 h\xe3\xbb\x86},^\xea{Va\ x96^K\x1b<4\xcc\x076\x9bN\x16\xdb\x08XQ\xa9\xc7\x86\xf59\xc7\x07;P\xf5\xe0$\xf2\x10\xc1\x13\xd2\x86 \xe4d\xa6c\xeaD\x94j).\x10\xb7\xa3\xd1\x9dP\xc1g\xf0\xb9\xceE\x83M'O\xd0\xca\x95\xcd\xd5\xe3\xee\xbeH7L\xc6h \x8d\x92\x99R\x89\xb5\xd7G\xa2\xf5^\x90'\xb0\x03\x07{\x13\xf1\x86\x08\xacz\xde\x90\xca\xae\x1bB\x91\ x893\x1c\x8e\x0b\xcb\x99\xea\xdeh:\xc9h\xf8W\xb4\xd0\xb6F\x81\x0ej\xe5\xa8\x84CQ\xd5\xa0\xeb\x98\x88\x98\xb2 \x81}\xce\xb9L\xf9U,\x12\x14D\x9e\x13\x8a\x82\xa4\xf9%\xde\x87\xb1\xa8\x90\xe0;\xc3B\xbc\xc8\xf1\xa8 [\x8c\t\xa4\xc6\x1e;\xcb\xb1\x97\t\xf4{N\xf4\x98~\x87\xd8X\xf1\x83p\xc5\x1cykOL\xa1\x04\x18\x90kN\ x80V\xee\xa4\xf1\xa7\xdc\xbfBZ~\x86\xb0\xbc\x9e\x7fq\x18\xf6\x7f\xd9 \x0f \x8aa\x19\x1fr\x88\xe1{O\xbf\x01PK\x07\x08z\x94\xcaq.\x01\x00\x00\x1c\x04\x00\x00PK\x03\x04\x14\x00\ x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x00\x00_rels/.rels\xad\x92\xc1j\xc30 \x0c\x86_\xc5\xe8\xde8\xed`\x8cQ\xb7\x972\xe8m\x8c\xee\x014[ILb\xcb\xd8\xda\x96\xbd\xfd\xcc?[K\n\x1b \xec($}\xff\x07\xd2v.\x87I\xbdQ.\x9e\xa3\x81u\xd3\x82\xa2h\xd9\xf9\xd8\x1bx>=\xac\xee@\x15\xc1\xe8p \xe2H\x06"\xc3~\xb7}\xa2\t\xa5n\x94\xc1\xa7\xa2 "\x16\x03\x83H\xba\xd7\xba\xd8\x81\x02\x96\x86\x13 \xc5\xda\xe98\x07\x94Z\xe6^'\xb4#\xf6\xa47m{\xab\xf3O\x06\x9c3\xd5\xd1\x19\xc8G\xb7\x06u\xc2\xdc\x93\x18 \x98'\xfd\xcey|a\x1e\x9b\x8a\xad\x8d\x8fD\xbf\t\xe5\xae\xf3\x96\x0el_\x03EY\xc8\xbe\x98\x00\xbd\xec\ xb2\xf9vql\x1f3\xd7ML\xe9\xbfeh\x16\x8a\x8e\xdc*\xd5\x04\xca\xe2\xa9\3\xbaY0\xb2\x9c\xe9oJ\xd7\x8f\xa2\x03\t :\x14\xfc\xa2^\x08\xe9\xb3\x1f\xd8}\x02PK\x07\x08\xa7\x8cz\xbd\xe3\x00\x00\x00I\x02\x00\x00PK\x03\x04\ x14\x00\x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ x00\x10\x00\x00\x00docProps/app.xmlM\x8e\xc1\n\xc20\x10D\xef~E\xc8\xbd\xdd\xeaAD\xd2\x94\x82\x08\x9e\xecA? \xa4\xdb6\xd0lB\xb2J?

Answer 1

初始字節PK\x03\x04表明它是 PK Zip 格式。 嘗試先解壓縮它，使用unzip x <filename>或使用 Python 內置zipfile模塊。

Answer 2

原來的 URL （現在編輯出問題了）表明下載的文件是.xlsx格式。 .xlsx格式本質上是 zip 存檔中的一個或多個 xml 文件（iBug 的答案在這方面是正確的）。

因此，如果您想在 dataframe 中獲取文件數據，請告訴 Pandas 將其作為 excel 文件讀取。

import pandas as pd

url = "someURL" 
req = requests.get(URL)
url_content = req.content

# Load into a dataframe
df = pd.read_excel(url_content)

# Write to csv
df.to_csv('data.csv')

how to store bytes like b'PK\x03\x04\x14\x00\x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00 to dataframe or csv in python

問題描述

2 個解決方案

解決方案1
2 2021-03-27 12:21:45

解決方案2
2 已采納 2021-03-27 12:57:40

how to store bytes like b'PK\x03\x04\x14\x00\x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00 to dataframe or csv in python

問題描述

2 個解決方案

解決方案1 2 2021-03-27 12:21:45

解決方案2 2 已采納 2021-03-27 12:57:40

解決方案1
2 2021-03-27 12:21:45

解決方案2
2 已采納 2021-03-27 12:57:40