I am trying to extract data from a website in python

Question

def convert():
    for url in url_list:
        news=Article(url)
        news.download()
        while news.download_state != 2:
            time.sleep(1)
        news.parse()
        l.append(
            {'Title':news.title, 'Text': news.text.replace('\n',' '), 'Date':news.publish_date, 'Author':news.authors}
        )

convert()
df = pd.DataFrame.from_dict(l)
df.to_csv('Amazon_try2'+'.csv',encoding='utf-8', index=False)

The function convert() goes through a list of url and process each of them. Each url is a link to an article. I am fetching the important attributes of articles such as author, text etc and then storing this in a data frame. After that, I am converting data frame to a csv file. The script ran for about 5 hours as there were 589 urls in url_list. But I still couldn't get the csv file. Can somebody spot out where I am going wrong.

Answer 1

Assuming this is your whole program, you need to return l from convert.

def convert():
    for url in url_list:
        news=Article(url)
        news.download()
        while news.download_state != 2:
            time.sleep(1)
        news.parse()
        l.append(
            {'Title':news.title, 'Text': news.text.replace('\n',' '), 'Date':news.publish_date, 'Author':news.authors}
        )
    return l 

l = convert()
df = pd.DataFrame.from_dict(l)
df.to_csv('Amazon_try2'+'.csv',encoding='utf-8', index=False)

Answer 2

probably your function stops here:

    while news.download_state != 2:
        time.sleep(1)

it is waiting for the change of the download state but it never happens. your function should also return a list

something like this should work:

def convert():
    for url in url_list:
        news=Article(url)
        news.download()

        news.parse()
        l.append(
            {'Title':news.title, 'Text': news.text.replace('\n',' '), 'Date':news.publish_date, 'Author':news.authors}
        )
    return l 

l = convert()
df = pd.DataFrame.from_dict(l)
df.to_csv('Amazon_try2'+'.csv',encoding='utf-8', index=False)

I am trying to extract data from a website in python

Question

2 answers

solution1
0 2018-06-12 09:50:52

solution2
0 2018-06-12 09:54:14

I am trying to extract data from a website in python

Question

2 answers

solution1 0 2018-06-12 09:50:52

solution2 0 2018-06-12 09:54:14

solution1
0 2018-06-12 09:50:52

solution2
0 2018-06-12 09:54:14