简体   繁体   中英

python 2.7: scraping tables from a website

I am probably doing my scraping incorrectly given I know little programming but I would like to know how I scrape data from an html table in python and associate it with its own class...I don't really know what Im doing so here is an example:

<div class="example">
    <a href="/example/thisexample">
      <span class="name">Product name</span>
    </a>
      <table>
        <tbody>
          <tr class="odd"> Some data </tr>
          <tr class="even"> Some data </tr>
          <tr class="odd"> Some data </tr>
          <tr class="even"> Some data </tr>
          <tr class="odd"> More data</tr>
        </tbody>
      </table>
</div>

So far Im able collect the data using lxml and place it in a list, however, the webpage contains many classes (like example) and all have different tables with more or less rows than above. I would like the data from these tables to be associated with the class aka here the product name... Sorry if this makes little sense, I am new to this and havent touched python except for an intro class a couple years ago

You said you store the data in lists, but you wanted them to be associated with the classes you get from the HTML? If I am understanding correctly, store them as a dictionary:

stuff = {}

stuff['class name #1'] = ['data thing #1 from table in class', 'data thing #2 from table in class', .... 'data thing #3 from table in class']
.
.
.
stuff['class name #n'] = ....

this way your "stuff" dictionary will store the things in a relational way, thus you associated what is in what by have keys to those things

does that make sense? is that what you are asking?

more about dictionaries here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM