简体   繁体   中英

How to loop through scraped items and add them to a dictionary or pandas dataframe?

For a project I'm scraping data from futbin players and I would like to add that scraped data to a dict or pandas dataframe. I'm stuck for a couple of hours and would like some help if possible. I will put my code below on what I have so far. This piece of code only prints out the data and from that I'm clueless about what to do.

Code:

from requests_html import HTMLSession
import requests
from bs4 import BeautifulSoup
import pandas as pd
urls = ['https://www.futbin.com/21/player/87/pele', 'https://www.futbin.com/21/player/27751/robert-lewandowski']

for url in urls:
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    info = soup.find('div', id='info_content')
    rows = info.find_all('td')
    for info in rows:
        print(info.text.strip())

I would do it with open() and write()

 file = open ("filename.txt", "w")

The w specifies the following:

"w" - Write - Opens a file for writing, creates the file if it does not exist

And then:

 file.write (text_to_save)

Be sure to include os.path!

import os.path 

The work you have already done to identify the table you want is good.

  • use read_html() to convert to a dataframe
  • basic transforms to turn it into columns rather than key value pairs
  • in list comprehension get details of all wanted footballers
import requests
from bs4 import BeautifulSoup
import pandas as pd
urls = ['https://www.futbin.com/21/player/87/pele', 'https://www.futbin.com/21/player/27751/robert-lewandowski']

def myhtml(url):
    # use BS4 to get table that has required data
    html = str(BeautifulSoup(requests.get(url).content, 'html.parser').find('div', id='info_content').find("table"))
    # read_html() returns a list, take first one,  first column are attribute name, transpose to build DF
    return pd.read_html(html)[0].set_index(0).T

df = pd.concat([myhtml(u) for u in urls])

Name Club Nation League Skills Weak Foot Intl.Rep Foot Height Weight Revision Def. WR Att. WR Added on Origin R.Face B.Type DOB Robert Lewandowski FIFA 21 Career Mode Age
1 Edson Arantes Nascimento FUT 21 ICONS Brazil Icons 5 4 5 Right 173cm 5'8" 70 Icon Med High 2020-09-10 Prime nan Unique 23-10-1940 nan
1 Robert Lewandowski FC Bayern Poland Bundesliga 4 4 4 Right 184cm 6'0" 80 TOTY Med High 2021-01-22 TOTY nan Unique nan Robert Lewandowski FIFA 21 Career Mode

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM