簡體   English   中英

抓取XML節點+ Nokogiri和xpath的文本值

[英]Grabbing the text value of an XML node + Nokogiri and xpath

我已經構建了一個rake文件,將我抓取的所有信息插入到我的數據庫中。 這一切都正常,但我的鍵的值沒有填充任何數據。 我可能錯誤地使我的at_xpath調用? 我將在下面發布一個例子 -

information = {
            "street_address" => property.at_xpath("/Address/AddressLine1/text()"),
            "city" => property.at_xpath("/Address/City/text()"),
            "zipcode" => property.at_xpath("/Address/PostalCode/text()"),
            "short_description" => property.at_xpath("/Information/ShortDescription/text()"),
            "long_description" => property.at_xpath("Information/LongDescription/text()"),
            "rent" => property.at_xpath("/Information/Rents/StandardRent/text()"),
            "application_fee" => property.at_xpath("/Fee/ApplicationFee/text()"),
            "bedrooms" => property.at_xpath("/Floorplan/Room[@RoomType='Bedroom']/Count/text()"),
            "bathrooms" => property.at_xpath("/Floorplan/Room[@RoomType='Bathroom']/Count/text()"),
            "bathrooms" => property.at_xpath("/ILS_Unit/Availability/VacancyClass/text()")
        }

我知道除了將數據放入上面列出的哈希中的實際值空間之外,一切都完美無缺。 我也知道nokogiri和xpath工作正常,因為我將s的數量從33,000+縮小到1,068。

任何指導都將非常感激! 謝謝 :)

更新======================== ====

我認為看到整個循環可能有助於增加清晰度 -

doc.xpath("//Property/PropertyID/Identification[@OrganizationName='northsteppe']").each do |property|

        # GATHER EACH PROPERTY'S INFORMATION
        information = {
            "street_address" => property.at_xpath("/Address/AddressLine1/text()"),
            "city" => property.at_xpath("/Address/City/text()"),
            "zipcode" => property.at_xpath("/Address/PostalCode/text()"),
            "short_description" => property.at_xpath("/Information/ShortDescription/text()"),
            "long_description" => property.at_xpath("Information/LongDescription/text()"),
            "rent" => property.at_xpath("/Information/Rents/StandardRent/text()"),
            "application_fee" => property.at_xpath("/Fee/ApplicationFee/text()"),
            "bedrooms" => property.at_xpath("/Floorplan/Room[@RoomType='Bedroom']/Count/text()"),
            "bathrooms" => property.at_xpath("/Floorplan/Room[@RoomType='Bathroom']/Count/text()"),
            "bathrooms" => property.at_xpath("/ILS_Unit/Availability/VacancyClass/text()")
        }


        # CREATE NEW PROPERTY WITH INFORMATION HASH CREATED ABOVE
        if Property.create!(information)
            puts "yay!"
        else
            puts "oh no! this sucks!"
        end

    end # ENDS XPATH EACH LOOP

============================另一個更新==================== ======

所以我嘗試用“/ inner_text()”交換每個at_xpath路徑末尾的“/ text()”並收到以下錯誤 -

耙子流產了! 表達式無效:/ Address / AddressLine1 / inner_text()

然后我嘗試將我的“at_xpath”調用切換為“at_css”調用並執行類似的操作 -

"street_address" => property.at_css(".AddressLine1").text

但收到以下錯誤 -

耙子流產了! nil的未定義方法`text':NilClass

=============================更新顯示XML ================= ==========

<Property IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
  <PropertyID>
    <Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="northsteppe" IDType="property"/>
    <Identification IDValue="6e1e61523972d5f0e260e3d38eb488337424f21e" OrganizationName="northsteppe" IDType="Company"/>
    <MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
    <WebSite>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</WebSite>
    <Address AddressType="property">
      <Description>Address of Available Listing</Description>
      <AddressLine1>1689 N 4th St </AddressLine1>
      <City>Columbus</City>
      <State>OH</State>
      <PostalCode>43201</PostalCode>
      <Country>US</Country>
    </Address>
    <Phone PhoneType="office">
      <PhoneNumber>(614) 299-4110</PhoneNumber>
    </Phone>
    <Email>northsteppe.nsr@gmail.com</Email>
  </PropertyID>
  <ILS_Identification ILS_IdentificationType="Apartment" RentalType="Market Rate">
    <Latitude>39.997694</Latitude>
    <Longitude>-82.99903</Longitude>
    <LastUpdate Month="11" Day="11" Year="2013"/>
  </ILS_Identification>
  <Information>
    <StructureType>Standard</StructureType>
    <UnitCount>1</UnitCount>
    <ShortDescription>Spacious House Central Campus OSU, available fall</ShortDescription>
    <LongDescription>One of our favorites! This great house is perfect for students or a single family. With huge living and sleeping rooms, there is plenty of space. The kitchen is totally modernized with new appliances, and the bathroom has been updated. Natural woodwork and brick accents are seen within the house, and the decorative mantles. Ceiling fans and mini-blinds are included, as well as a FREE stack washer and dryer. The front and side deck. On site parking available.</LongDescription>
    <Rents>
      <StandardRent>2000.00</StandardRent>
    </Rents>
    <PropertyAvailabilityURL>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</PropertyAvailabilityURL>
  </Information>
  <Fee>
    <ProrateType>Standard</ProrateType>
    <LateType>Standard</LateType>
    <LatePercent>0</LatePercent>
    <LateMinFee>0</LateMinFee>
    <LateFeePerDay>0</LateFeePerDay>
    <NonRefundableHoldFee>0</NonRefundableHoldFee>
    <AdminFee>0</AdminFee>
    <ApplicationFee>30.00</ApplicationFee>
    <BrokerFee>0</BrokerFee>
  </Fee>
  <Deposit DepositType="Security Deposit">
    <Amount AmountType="Actual">
      <ValueRange Exact="2000.00" Currency="USD"/>
    </Amount>
  </Deposit>
  <Policy>
    <Pet Allowed="false"/>
  </Policy>
  <Phase IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
    <Name/>
    <Description/>
    <UnitCount>1</UnitCount>
    <RentableUnits>1</RentableUnits>
    <TotalSquareFeet>0</TotalSquareFeet>
    <RentableSquareFeet>0</RentableSquareFeet>
  </Phase>
  <Building IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
    <Name/>
    <Description/>
    <UnitCount>1</UnitCount>
    <SquareFeet>0</SquareFeet>
  </Building>
  <Floorplan IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
    <Name/>
    <UnitCount>1</UnitCount>
    <Room RoomType="Bedroom">
      <Count>4</Count>
      <Comment/>
    </Room>
    <Room RoomType="Bathroom">
      <Count>1</Count>
      <Comment/>
    </Room>
    <SquareFeet Min="0" Max="0"/>
    <MarketRent Min="2000" Max="2000"/>
    <EffectiveRent Min="2000" Max="2000"/>
  </Floorplan>
  <ILS_Unit IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
    <Units>
      <Unit>
        <Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="UL Portfolio"/>
        <MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
        <UnitBedrooms>4</UnitBedrooms>
        <UnitBathrooms>1.0</UnitBathrooms>
        <MinSquareFeet>0</MinSquareFeet>
        <MaxSquareFeet>0</MaxSquareFeet>
        <SquareFootType>internal</SquareFootType>
        <UnitRent>2000.00</UnitRent>
        <MarketRent>2000.00</MarketRent>
        <Address AddressType="property">
          <AddressLine1>1689 N 4th St </AddressLine1>
          <City>Columbus</City>
          <PostalCode>43201</PostalCode>
          <Country>US</Country>
        </Address>
      </Unit>
    </Units>
    <Availability>
      <VacateDate Month="7" Day="23" Year="2014"/>
      <VacancyClass>Unoccupied</VacancyClass>
      <MadeReadyDate Month="7" Day="23" Year="2014"/>
    </Availability>
    <Amenity AmenityType="Other">
      <Description>All new stainless steel appliances!  Refinished hardwood floors</Description>
    </Amenity>
    <Amenity AmenityType="Other">
      <Description>Ceramic tile</Description>
    </Amenity>
    <Amenity AmenityType="Other">
      <Description>Ceiling fans</Description>
    </Amenity>
    <Amenity AmenityType="Other">
      <Description>Wrap-around porch</Description>
    </Amenity>
    <Amenity AmenityType="Dryer">
      <Description>Free Washer and Dryer</Description>
    </Amenity>
    <Amenity AmenityType="Washer">
      <Description>Free Washer and Dryer</Description>
    </Amenity>
    <Amenity AmenityType="Other">
      <Description>off-street parking available</Description>
    </Amenity>
  </ILS_Unit>
  <File Active="true" FileID="820982141">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/31077069-6e81-4373-8a89-508c57585543/medium.jpg</Src>
    <Width>360</Width>
    <Height>300</Height>
    <Rank>1</Rank>
  </File>
  <File Active="true" FileID="820982145">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/84e1be40-96fd-4717-b75d-09b39231a762/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>2</Rank>
  </File>
  <File Active="true" FileID="820982149">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/cd419635-c37f-4676-a43e-c72671a2a748/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>3</Rank>
  </File>
  <File Active="true" FileID="820982152">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/6b68dbd5-2cde-477c-99d7-3ca33f03cce8/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>4</Rank>
  </File>
  <File Active="true" FileID="820982155">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/17b6c7c0-686c-4e46-865b-11d80744354a/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>5</Rank>
  </File>
  <File Active="true" FileID="820982157">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/3545ac8b-471f-404a-94b2-fcd00dd16e25/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>6</Rank>
  </File>
  <File Active="true" FileID="820982160">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/02471172-2183-4bf1-a3d7-33415f902c1c/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>7</Rank>
  </File>
</Property>

在你的循環中你做:

doc.xpath("//Property/PropertyID/Identification[@OrganizationName='northsteppe']").each do |property|

然后,為了你的價值觀,你可以做以下事情:

property.at_xpath("/Address/AddressLine1/text()")

您不能使用/Address/AddressLine1/text()相對於XPath property

引入nokogiri將搜索/Address/AddressLine1/text()這意味着,在絕對路徑,這會從文檔的頂部開始啟動/ ,找到Address節點立即在它下面,找到AddressLine1其下的節點.. ..

而是使用:

Address/AddressLine1/text()

這意味着搜索對於property並導致完整的XPath:

//Property/PropertyID/Identification[@OrganizationName='northsteppe']/Address/AddressLine1/text()

查看您添加的XML ...

您想要的路徑不存在。 在PRY中看着它:

[16] (pry) main: 0> puts doc.xpath("//Property/PropertyID/Identification[@OrganizationName='northsteppe']").to_xml
<Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="northsteppe" IDType="property"/><Identification IDValue="6e1e61523972d5f0e260e3d38eb488337424f21e" OrganizationName="northsteppe" IDType="Company"/>

兩個property節點都沒有子節點。 只存在property節點,因此您要查找的所有值(子節點)都不存在。

相反,看起來您想要找到Property節點並向下工作:

你的第一個XPath太深了。 它返回一個需要PropertyID的標識。 嘗試這個:

doc.xpath("//Property/PropertyID[ Identification/@OrganizationName = 'northsteppe' ]").each do |property|
    # GATHER EACH PROPERTY'S INFORMATION
    information = {
        "street_address" => property.at_xpath("Address/AddressLine1/text()").to_s,
        "city" => property.at_xpath("Address/City/text()").to_s,
        "zipcode" => property.at_xpath("Address/PostalCode/text()").to_s
        }
    p information
end

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM