我如何使用python和beautifulsoup從嵌入html的Excel工作表中提取數據？

Question

因此，我想到了從網頁上的表中提取數據的想法，這樣我就可以對它進行平均，直觀地表示並使用它。 我嘗試將python與beautifulsoup結合使用來獲取數據，但最終還是出現了如下這樣的奇怪的excel格式化代碼：

<!--table
    {mso-displayed-decimal-separator:"\.";
    mso-displayed-thousand-separator:"\,";}
@page
    {margin:1.0in .75in 1.0in .75in;
    mso-header-margin:.51in;
    mso-footer-margin:.51in;}
.style0
    {mso-number-format:General;
    text-align:general;
    vertical-align:bottom;
    white-space:nowrap;
    mso-rotate:0;
    mso-background-source:auto;
...(more of the same)
...

-->

我查看了頁面的源代碼，其中包括：

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 14">

我該如何以有意義的方式提取數據，保留並允許對其進行操作？ 感謝您的時間。

我當前的腳本僅使用curl獲取html文件，然后打開html文件並在其上使用beautifulsoup get_text，並將其保存到文本文件中。

Answer 1

你在做這樣的事情嗎：

 import BeautifulSoup
 s = BeautifulSoup.BeautifulSoup(html)
 table = s.find("table", {"id": "mytableid"})
 try:
     rows = table.findAll('tr')
     for tr in rows:
         cols = tr.findAll('td')
         for td in cols:
             val = td.text

在您改善問題之前，我無法給您更好的答案。

我如何使用python和beautifulsoup從嵌入html的Excel工作表中提取數據？

問題描述

1 個解決方案

解決方案1
0 2013-11-21 07:31:54

我如何使用python和beautifulsoup從嵌入html的Excel工作表中提取數據？

問題描述

1 個解決方案

解決方案1 0 2013-11-21 07:31:54

解決方案1
0 2013-11-21 07:31:54