繁体   English   中英

使用 Python 和 append 遍历列表列表到 dataframe

[英]Loop through the lists of lists using Python and append to dataframe

我试图遍历列表列表并将所有链接和append作为一张表刮到 dataframe 中,但徒劳无功。 帮助将不胜感激。

import pandas as pd

import requests
from bs4 import BeautifulSoup

page = requests.get('https://money.rediff.com/companies/groups/A')
soup = BeautifulSoup(page.content, 'html.parser')
company_name = []
company_link = []
company_link_edit=[]

company_A_subpg1 = soup.find_all(class_='dataTable')

def convert(url):

  if not url.startswith('http://'):
    return 'http:' + url
  return url

data_df = pd.DataFrame()

for sub_tab in company_A_subpg1:
    for tab in sub_tab:

        sub_table_1 = tab.find_all('a', href=True)
        company_name = [name.text.strip() for name in sub_table_1]
        company_link = [name.get('href') for name in sub_table_1]
        company_link_edit=[convert(name) for name in company_link]

df=pd.DataFrame(
        {'Name':company_name,
         'Link':company_link_edit
         })
data_df = pd.concat([data_df, df], sort=False)


data_df.to_csv('results_3.csv')

import pandas as pd

import requests
from bs4 import BeautifulSoup

page = requests.get('https://money.rediff.com/companies/groups/A')
soup = BeautifulSoup(page.content, 'html.parser')
company_name = []
company_link = []
company_link_edit=[]

company_A_subpg1 = soup.find_all(class_='dataTable')

def convert(url):

  if not url.startswith('http://'):
    return 'http:' + url
  return url

for sub_tab in company_A_subpg1:
  temp = sub_tab.find('tbody')
  all_rows = temp.find_all('tr')
  for val in all_rows:
    a_tag = val.find('a', href=True)
    company_name.append(a_tag.text.strip())
    company_link_edit.append(convert(a_tag.get('href')))


print(len(company_name), len(company_link_edit))

data_df = pd.DataFrame()
df=pd.DataFrame(
        {'Name':company_name,
         'Link':company_link_edit
         })
data_df = pd.concat([data_df, df], sort=False)

print(df.shape)

data_df.to_csv('results_3.csv')

您可以检查 csv 文件中的值,我获取了页面中提到的所有 200 个名称和链接。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM