简体   繁体   中英

Pandas appending to series

I am trying to write some code to scrape a website for a list of links which I will then do something else with after. I found some code here that I am trying to adapt so that instead of printing the list it adds it to a series. The code I have is as follows:

import pandas as pd
from bs4 import BeautifulSoup
from urllib.parse import urljoin
user_agent = {'User-agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0'}

linksList = pd.Series()

def process(url):
    r = requests.get(url, headers=user_agent)
    soup = BeautifulSoup(r.text, "lxml")

    for tag in soup.findAll('a', href=True):
        tag['href'] = urljoin(url, tag['href'])
        linksList.append(tag['href'])

When I pass a URL I get the following error

cannot concatenate a non-NDFrame object

Any idea where I am going wrong?

.append() method of a Series object expects an another Series object as an argument. In other words, it is used to concatenate Series together.

In your case, you can just collect the href values into a list and initialize a Series :

def process(url):
    r = requests.get(url, headers=user_agent)
    soup = BeautifulSoup(r.text, "lxml")

    return [urljoin(url, tag['href']) for tag in soup.select('a[href]')]:

links_list = pd.Series(process())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM