如何使用python / BeautifulSoup或類似工具將kml文件解析為csv？

Question

我一直嘗試以有限的成功將Google Earth KML文件轉換為GIS shapefile（或其他GIS文件格式，例如Postgresql / PostGIS表）（請參閱-GIS.stackexchange問題，這里我本質上想將KML文件轉換為CSV。

我的問題是KML文件包含一些存儲在HTML表中的數據，因此，解析后的KML文件在包含HTML的結果數據表中具有一個字段，如下所示：

    "<br><br><br>
<table border="1" padding="0">
<tr><td>ID_INT</td><td>NGA0104001</td></tr>
<tr><td>N_sd</td><td>Igbere</td></tr>
<tr><td>Skm2</td><td>3.34</td></tr>
<tr><td>PT2010</td><td>13000</td></tr>"

當使用GDAL庫時，我最終得到一個CSV文件，其中一個字段包含大量HTML。 我希望使用BeautifulSoup（或一些類似的Python庫）將KML文件的HTML元素解析為CSV文件中的四個單獨的字段。 我似乎能夠將KML傳遞給BeautifulSoup，但不確定從這里開始做什么，或者不確定是否有另一種方法可以實現相同的目的。

我在這里和其他地方都讀過很多與此主題相關的類似問題，但實際上並不知道從哪里開始解析此文件。 有沒有人在實現這一目標方面取得任何成功？ 提前很多非常感謝...

哦，這是我的KML文件中的一部分代碼作為示例：

 <?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
    <Document>
    <name>AFNGA_SWAC.kml</name>
    <open>1</open>
    <Style id="s_ylw-pushpin1">
        <IconStyle>
            <scale>1.1</scale>
            <Icon>
                <href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href>
            </Icon>
            <hotSpot x="20" y="2" xunits="pixels" yunits="pixels"/>
        </IconStyle>
        <LineStyle>
            <color>ff00ffff</color>
            <width>3</width>
        </LineStyle>
        <PolyStyle>
            <color>3300ffff</color>
        </PolyStyle>
    </Style>
    <StyleMap id="m_ylw-pushpin1">
        <Pair>
            <key>normal</key>
            <styleUrl>#s_ylw-pushpin1</styleUrl>
        </Pair>
        <Pair>
            <key>highlight</key>
            <styleUrl>#s_ylw-pushpin_hl1</styleUrl>
        </Pair>
    </StyleMap>
    <Style id="s_ylw-pushpin_hl1">
        <IconStyle>
            <scale>1.3</scale>
            <Icon>
                <href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href>
            </Icon>
            <hotSpot x="20" y="2" xunits="pixels" yunits="pixels"/>
        </IconStyle>
        <LineStyle>
            <color>ff00ffff</color>
            <width>3</width>
        </LineStyle>
        <PolyStyle>
            <color>3300ffff</color>
        </PolyStyle>
    </Style>
    <Folder>
        <name>AFNGA_SWAC</name>
        <open>1</open>
        <description>1027 Éléments de la couche Afnga_swac</description>
        <Placemark>
            <name>Aba</name>
            <description><![CDATA[<br><br><br>
    <table border="1" padding="0">
    <tr><td>ID_INT</td><td>NGA0101001</td></tr>
    <tr><td>N_sd</td><td>Aba</td></tr>
    <tr><td>Skm2</td><td>384.07</td></tr>
    <tr><td>PT2010</td><td>1010000</td></tr>]]></description>
            <styleUrl>#m_ylw-pushpin1</styleUrl>
            <Polygon>
                <extrude>1</extrude>
                <tessellate>1</tessellate>
                <outerBoundaryIs>
                    <LinearRing>
                        <coordinates>
                            7.294567000000001,5.00267,0 7.294408999999999,5.002552,0 7.294211,5.002394,0

Answer 1

Beautiful Soup通常非常擅長直接實現所需的功能（假定您可以在xml / html中輕松識別包含正在查找的數據的模式）。 我不知道您要如何格式化輸出，但是如果您要在<description>標記中查找數據，那實際上很容易（以下示例來自Python3）：

from bs4 import BeautifulSoup

inputfile = "whateveryourfileiscalled.xml"
with open(inputfile, 'r') as f:
  soup = BeautifulSoup(f)

  # After you have a soup object, you can access tags very easily.
  # For instance, you can iterate over and get <description> like so:

  for node in soup.select('description'):
       print(node)

通常，這不是非常有用的，所以向下鑽取深一點，我們甚至可以訪問我們找到節點內的元素<description> 。 另外，我們可以根據需要僅隔離文本（使用“字符串”屬性）：

  for node in soup.select('description'):
     for item in node.select('td'):
         print(item.string)

我總是打印以測試我得到了想要的東西。 如果那里什么都沒有，您將得到很多None 。 無論如何，這應該使您更接近，並且顯然，除了打印輸出之外，您還可以對其進行任何處理（存儲在某個容器中，將其寫到csv等）。 這可能適用於您粘貼到注釋中的塊，但可能不適用於初始問題中的塊，因為那里有多個描述標簽。

在您的問題中，您有多個<description>標記，並且並非所有標記都具有節點，在這種情況下，您需要使用find_all而不是select：

  for node in soup.find_all('description'):
      for item in node.find_all('td'):
          print(item.string)

如何使用python / BeautifulSoup或類似工具將kml文件解析為csv？

問題描述

1 個解決方案

解決方案1
1 已采納 2013-09-15 21:45:08

如何使用python / BeautifulSoup或類似工具將kml文件解析為csv？

問題描述

1 個解決方案

解決方案1 1 已采納 2013-09-15 21:45:08

解決方案1
1 已采納 2013-09-15 21:45:08