简体   繁体   English

为什么我的 python “循环”仅在我想从 url 列表中删除表并将它们转换为 df

[英]Why does my python "loop for" only work on the last table when I want to scrap tables from list of urls and convert them to df

I've got some issues converting table from list of urls to a large Dataframe with all the rows from different urls.我在将表格从 url 列表转换为大型 Dataframe 时遇到了一些问题,其中所有行都来自不同的 url。 It seems that my code runs well however when I want to export a new csv it only returns me the last 10 rows from the last URL instead of each url.似乎我的代码运行良好,但是当我想导出新的 csv 时,它只返回最后一个 URL 的最后 10 行,而不是每个 Z572D4E421E5E6B9BC11D815E8A02712。 Does someone know why?有人知道为什么吗?

ps: I tried to find the answer in browsing Stack but I did not find out ps:我试图在浏览Stack中找到答案,但我没有找到

import pandas as pd
from bs4 import BeautifulSoup
import requests
#Pandas/numpy for data manipulation
import numpy as np

# URL 0 - 10 SCRAPE


BASE_URL = [
'https://datan.fr/groupes/legislature-16/re',
'https://datan.fr/groupes/legislature-16/rn',
'https://datan.fr/groupes/legislature-16/lfi-nupes',
    'https://datan.fr/groupes/legislature-16/lr',
    'https://datan.fr/groupes/legislature-16/dem',
    'https://datan.fr/groupes/legislature-16/soc',
    'https://datan.fr/groupes/legislature-16/hor',
    'https://datan.fr/groupes/legislature-16/ecolo',
    'https://datan.fr/groupes/legislature-16/gdr-nupes',
    'https://datan.fr/groupes/legislature-16/liot',
]

Tous_les_groupes = []
b=0
for b in BASE_URL:

    html = requests.get(b).text
    soup = BeautifulSoup(html, "html.parser")
    #identify table we want to scrape
    Tableau_groupe = soup.find('table', {"class" : "table"})
    print(Tableau_groupe)


try:

    for row in Tableau_groupe.find_all('tr'):
        cols = row.find_all('td')
        print(cols)

        if len(cols) == 4:
            Tous_les_groupes.append((b, cols[0].text.strip(), cols[1].text.strip(), cols[2].text.strip(), cols[3].text.strip()))
            #print(Tous_les_groupes)
except:
    pass
Groupes_DF = np.asarray(Tous_les_groupes)
#print(Groupes_DF)
#print(len(Groupes_DF))

df = pd.DataFrame(Groupes_DF)
df.columns = ['url','G', 'Tx', 'note ','Number']
#print(df.head(10))

df.to_csv('output.csv')

Thanks for your help, and all have a great day.感谢您的帮助,祝大家度过愉快的一天。

In the first loop you assign the result of soup.find to Tableau_groupe , but each time it "overwrites" the previous value, thus mantaining only the last value.在第一个循环中,您将soup.find的结果分配给Tableau_groupe ,但每次它“覆盖”前一个值,因此只保留最后一个值。

Try moving the second for loop together with the first one:尝试将第二个 for 循环与第一个循环一起移动:

for b in BASE_URL:

    html = requests.get(b).text
    soup = BeautifulSoup(html, "html.parser")
    #identify table we want to scrape
    Tableau_groupe = soup.find('table', {"class" : "table"})
    print(Tableau_groupe)


    try:

        for row in Tableau_groupe.find_all('tr'):
            cols = row.find_all('td')
            print(cols)

            if len(cols) == 4:
                Tous_les_groupes.append((b, cols[0].text.strip(), cols[1].text.strip(), cols[2].text.strip(), cols[3].text.strip()))

    except:
        pass

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么我的循环只附加列表中的最后一项? - Why does my loop only append the last item from a list? 为什么我的 python for 循环只取循环的最后一个值 - Why does my python for loop only takes the last value of the loop 为什么它只更改列表中的最后一个字符串? (For 循环)python - Why does it only change the last string in the list? (For-loop) python 在python中,为什么在使用for循环时仅返回列表的最后一个元素? - In python, why only the last element of a list is returned when use for loop? I have a long list of dataframe and want to convert each to numpy array X1,X2,X3 given pandas dataframes df1,df2,df3 in python using for loop - I have a long list of dataframe and want to convert each to numpy array X1,X2,X3 given pandas dataframes df1,df2,df3 in python using for loop 为什么这个列表理解只在 df.apply 中有效? - Why does this list comprehension only work in df.apply? 我有这个列表和我的 for 循环,它只显示列表的奇数,但我希望它只显示 5 个第一个奇数,如果没有 5 则显示所有这些 - i have this list and my for loop that only show odd numbers of list but i want it show only 5 first odd number and if there is not 5 show all of them Python pandas:为什么我的训练数据 select 的 df.iloc[:, :-1].values 只到倒数第二列? - Python pandas: Why does df.iloc[:, :-1].values for my training data select till only the second last column? 我想将列表的最后两个元素转换为python中的嵌套列表 - I want to convert the last two elements of a list into a nested list in python 为什么退出while循环时列表消失了? - Why does my list disappear when I exit a while loop?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM