我正在尝试从python网站提取数据

Question

def convert():
    for url in url_list:
        news=Article(url)
        news.download()
        while news.download_state != 2:
            time.sleep(1)
        news.parse()
        l.append(
            {'Title':news.title, 'Text': news.text.replace('\n',' '), 'Date':news.publish_date, 'Author':news.authors}
        )

convert()
df = pd.DataFrame.from_dict(l)
df.to_csv('Amazon_try2'+'.csv',encoding='utf-8', index=False)

The function convert() goes through a list of url and process each of them. 函数convert（）遍历URL列表并处理每个URL。 Each url is a link to an article. 每个网址都是文章的链接。 I am fetching the important attributes of articles such as author, text etc and then storing this in a data frame. 我正在获取诸如作者，文本等文章的重要属性，然后将其存储在数据框中。 After that, I am converting data frame to a csv file. 之后，我将数据帧转换为csv文件。 The script ran for about 5 hours as there were 589 urls in url_list. 该脚本运行了大约5个小时，因为url_list中有589个URL。 But I still couldn't get the csv file. 但是我仍然无法获取csv文件。 Can somebody spot out where I am going wrong. 有人可以找出我要去哪里。

Answer 1

Assuming this is your whole program, you need to return l from convert. 假设这是您的整个程序，则需要从convert返回l。

def convert():
    for url in url_list:
        news=Article(url)
        news.download()
        while news.download_state != 2:
            time.sleep(1)
        news.parse()
        l.append(
            {'Title':news.title, 'Text': news.text.replace('\n',' '), 'Date':news.publish_date, 'Author':news.authors}
        )
    return l 

l = convert()
df = pd.DataFrame.from_dict(l)
df.to_csv('Amazon_try2'+'.csv',encoding='utf-8', index=False)

Answer 2

probably your function stops here: 可能您的功能在这里停止：

    while news.download_state != 2:
        time.sleep(1)

it is waiting for the change of the download state but it never happens. 它正在等待下载状态的更改，但从未发生。 your function should also return a list 您的函数还应该返回一个列表

something like this should work: 这样的事情应该工作：

def convert():
    for url in url_list:
        news=Article(url)
        news.download()

        news.parse()
        l.append(
            {'Title':news.title, 'Text': news.text.replace('\n',' '), 'Date':news.publish_date, 'Author':news.authors}
        )
    return l 

l = convert()
df = pd.DataFrame.from_dict(l)
df.to_csv('Amazon_try2'+'.csv',encoding='utf-8', index=False)

我正在尝试从python网站提取数据

问题描述

2 个解决方案

解决方案1
0 2018-06-12 09:50:52

解决方案2
0 2018-06-12 09:54:14

我正在尝试从python网站提取数据

问题描述

2 个解决方案

解决方案1 0 2018-06-12 09:50:52

解决方案2 0 2018-06-12 09:54:14

解决方案1
0 2018-06-12 09:50:52

解决方案2
0 2018-06-12 09:54:14