如何將網絡抓取的數據寫入csv？

Question

我編寫了以下代碼來使用 BeautifulSoup I 提取表數據

import requests

website= requests.get('https://memeburn.com/2010/09/the-100-most-influential-news-media-twitter-accounts/').text

from bs4 import BeautifulSoup
soup= BeautifulSoup(website, 'lxml')


table= soup.find('table')

table_rows = table.findAll('tr')

for tr in table_rows:
    td= tr.findAll('td')
    rows = [i.text for i in td]
    print(rows)

這是我的輸出

['Number', '@name', 'Name', 'Followers', 'Influence Rank']
[]
['1', '@mashable', 'Pete Cashmore', '2037840', '59']
[]
['2', '@cnnbrk', 'CNN Breaking News', '3224475', '71']
[]
['3', '@big_picture', 'The Big Picture', '23666', '92']
[]
['4', '@theonion', 'The Onion', '2289939', '116']
[]
['5', '@time', 'TIME.com', '2111832', '143']
[]
['6', '@breakingnews', 'Breaking News', '1795976', '147']
[]
['7', '@bbcbreaking', 'BBC Breaking News', '509756', '168']
[]
['8', '@espn', 'ESPN', '572577', '187']
[]

請幫我將此數據寫入 .csv 文件（我是此類任務的新手）

Answer 1

使用 csv 編寫器。 將每一行寫入 csv 文件。

import requests
import csv
from bs4 import BeautifulSoup

website= requests.get('https://memeburn.com/2010/09/the-100-most-influential-news-media-twitter-accounts/').text

soup= BeautifulSoup(website, 'lxml')

table= soup.find('table')

table_rows = table.findAll('tr')

csvfile = 'twitterusers2.csv';

# Python 2
# with open(csvfile, 'wb') as outfile:
# Python 3 to ommit newline caracter
with open(csvfile, 'w', newline='') as outfile:
    wr = csv.writer(outfile)

    for tr in table_rows:
        td= tr.findAll('td')
        # Python 2 .encode("utf8") is mendatory sometimes playing with twitter data
        rows = [i.text.encode("utf8") for i in td]
        #ignore the empty elements and row td count not equal to 5
        if(len(rows) == 5):
            print(rows)
            wr.writerow(rows)

Answer 2

更好的解決方案是使用pandas因為它比其他庫更快。 這是整個代碼：

import requests
import pandas as pd 

website= requests.get('https://memeburn.com/2010/09/the-100-most-influential-news-media-twitter-accounts/').text

from bs4 import BeautifulSoup
soup= BeautifulSoup(website, 'lxml')

table= soup.find('table')

table_rows = table.findAll('tr')

first = True 

details_dict = {}

count = 0 

final_rows = []

for tr in table_rows:
    td= tr.findAll('td')
    rows = [i.text for i in td]
    #print(rows)
    
    for i in rows:
        if first == True:
            details_dict[i] = []
        else:
            key = list(details_dict.keys())[count]
            details_dict[key].append(i)
            count+=1 
    count = 0
    first = False 
    #print(details_dict)

df = pd.DataFrame(details_dict)
df.to_csv('D:\\Output.csv',index = False)

輸出截圖：

希望這會有所幫助！

Answer 3

最簡單的方法是使用pandas ：

# pip install pandas lxml beautifulsoup4

import pandas as pd

URI = 'https://memeburn.com/2010/09/the-100-most-influential-news-media-twitter-accounts/'

# read and clean
data = pd.read_html(URI, flavor='lxml', skiprows=0, header=0)[0].dropna()

# save to csv called data
data.to_csv('data.csv', index=False, encoding='utf-8')

如何將網絡抓取的數據寫入csv？

問題描述

3 個解決方案

解決方案1
1 2020-10-03 09:50:35

解決方案2
0 2020-10-03 09:57:20

解決方案3
0 2020-10-03 10:03:53

如何將網絡抓取的數據寫入csv？

問題描述

3 個解決方案

解決方案1 1 2020-10-03 09:50:35

解決方案2 0 2020-10-03 09:57:20

解決方案3 0 2020-10-03 10:03:53

解決方案1
1 2020-10-03 09:50:35

解決方案2
0 2020-10-03 09:57:20

解決方案3
0 2020-10-03 10:03:53