简体   繁体   中英

How can I scrape values without tag object with Beautiful soup

I am trying to scrape some data, basicly the black values from the picture (241, 86, 89...)

在此处输入图像描述

I am using Beautifulsoup in Python, I get to the point display the values with tags like "a" or "td" with like for example

featured_challenges = soup.find_all('a')
print(featured_challenges)

as a newbie I am not sure how to find the black values, since they don't seem to belong to any tag or attribute... can somebody help???

Thanks in advance,

Miguel

Try:

import pandas as pd

import requests

r = requests.get('https://sofifa.com/teams?type=club&lg%5B0%5D=16&lg%5B1%5D=19&lg%5B2%5D=31&lg%5B3%5D=53&showCol%5B0%5D=ti&showCol%5B1%5D=oa&showCol%5B2%5D=at&showCol%5B3%5D=md&showCol%5B4%5D=df&showCol%5B5%5D=tb&showCol%5B6%5D=bs&showCol%5B7%5D=bd&showCol%5B8%5D=bp&showCol%5B9%5D=bps&showCol%5B10%5D=cc&showCol%5B11%5D=cp&showCol%5B12%5D=cs&showCol%5B13%5D=cps&showCol%5B14%5D=da&showCol%5B15%5D=dm&showCol%5B16%5D=dw&showCol%5B17%5D=dd&showCol%5B18%5D=dp&showCol%5B19%5D=ip&showCol%5B20%5D=ps&showCol%5B21%5D=sa&showCol%5B22%5D=ta&r=200001&set=true')

dfs = pd.read_html(r.content)
df = pd.concat(dfs)
print(df.to_string())

prints:

    Unnamed: 0                                                Name      ID  OVA  ATT  MID  DEF Transfer Budget Speed Dribbling Passing Positioning Crossing Passing.1 Shooting Positioning.1 Aggression Pressure Team Width Defender Line  DP  IP  Players    SAA    TAA  Hits
0          NaN             FC Barcelona Spain Primera Division (1)     241   86   89   85   85         €187.9M  Slow    Little   Short   Organised   Little      Safe   Little     Organised    Contain     Deep     Narrow         Cover  10  10       33  28.45  24.06   868
1          NaN              Real Madrid Spain Primera Division (1)     243   86   85   87   86         €188.5M  Slow    Little   Short   Organised   Little      Safe   Little     Organised    Contain     Deep     Narrow         Cover  10  10       33  28.18  24.91   882
2          NaN          FC Bayern München German 1. Bundesliga (1)      21   85   85   85   84           €100M  Slow    Little   Short   Organised   Little      Safe   Little     Organised    Contain     Deep     Narrow         Cover  10   9       23  25.91  25.13   655
3          NaN                        Juventus Italian Serie A (1)      45   85   87   83   84            €90M  Slow    Little   Short   Organised   Little      Safe   Little     Organised    Contain     Deep     Narrow         Cover  10  10       33  29.00  27.00   633
4          NaN              Paris Saint-Germain French Ligue 1 (1)      73   84   87   84   82         €184.4M  Slow    Little   Short   Organised   Little      Safe   Little     Organised    Contain     Deep     Narrow         Cover  10   9       33  27.09  23.97   659
5          NaN          Borussia Dortmund German 1. Bundesliga (1)      22   83   82   83   83            €60M  Slow    Little   Short   Organised   Little      Safe   Little     Organised    Contain     Deep     Narrow         Cover   9   7       31  26.73  23.81   695

.. and so on...

And saves the data to a data.csv

A sample out of the csv:

在此处输入图像描述

If you don't want the column Unnamed: 0 you can use del df['Unnamed: 0'] after df = pd.concat(dfs) line

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM