简体   繁体   中英

Can't write in a CSV file python

I am trying to write data into a csv after scraping using pandas dataframe, but the csv is empty even after program execution. The headers are written first but they are also overwritten when dataframe comes into action. Here is the code:

from bs4 import BeautifulSoup
import requests
import re as resju
import csv
import pandas as pd
re = requests.get('https://www.farfeshplus.com/Video.asp?ZoneID=297')

soup = BeautifulSoup(re.content, 'html.parser')

links = soup.findAll('a', {'class': 'opacityit'})
links_with_text = [a['href'] for a in links]

headers = ['Name', 'LINK']
# this is output file, u can change the path as you desire, default is the working directory
file = open('data123.csv', 'w', encoding="utf-8")
writer = csv.writer(file)
writer.writerow(headers)

for i in links_with_text:
    new_re = requests.get(i)
    new_soup = BeautifulSoup(new_re.content, 'html.parser')
    m = new_soup.select_one('h1 div')
    Name = m.text

    print(Name)

    n = new_soup.select_one('iframe')
    ni = n['src']

    iframe = requests.get(ni)
    i_soup = BeautifulSoup(iframe.content, 'html.parser')

    d_script = i_soup.select_one('body > script')
    d_link = d_script.text

    mp4 = resju.compile(r"(?<=mp4:\s\[\')(.*)\'\]")
    final_link = mp4.findall(d_link)[0]
    print(final_link)

    df = pd.DataFrame(zip(Name, final_link))

    df.to_csv(file, header=None, index=False)

file.close()

df.head() returns:

 0  1
0  ل  h
1  ي  t
2  ل  t
3  ى  p
4     s
   0  1
0  ل  h
1  ي  t
2  ل  t
3  ى  p
4     s

Any suggestion?

It seems you are using a mix of libraries to write to a csv, pandas handles all this nicely so there is no need to use the inbuilt csv module from python -

I've modified your code below - this should return your dataframe as a whole df and write it out as a csv.

also using Headers=None you were setting the columns to nothing, so they would be referenced by an index number.

from bs4 import BeautifulSoup
import requests
import re as resju
#import csv
import pandas as pd
re = requests.get('https://www.farfeshplus.com/Video.asp?ZoneID=297')

soup = BeautifulSoup(re.content, 'html.parser')

links = soup.findAll('a', {'class': 'opacityit'})
links_with_text = [a['href'] for a in links]

names_ = [] # global list to hold all iterable variables from your loops
final_links_ = []

for i in links_with_text:
    new_re = requests.get(i)
    new_soup = BeautifulSoup(new_re.content, 'html.parser')
    m = new_soup.select_one('h1 div')
    Name = m.text
    names_.append(name) # append to global list. 


    print(Name)

    n = new_soup.select_one('iframe')
    ni = n['src']

    iframe = requests.get(ni)
    i_soup = BeautifulSoup(iframe.content, 'html.parser')

    d_script = i_soup.select_one('body > script')
    d_link = d_script.text

    mp4 = resju.compile(r"(?<=mp4:\s\[\')(.*)\'\]")
    final_link = mp4.findall(d_link)[0]
    print(final_link)
    final_links_.append(final_link) # append to global list.


df = pd.DataFrame(zip(names_, final_links_)) # use global lists.
df.columns = ['Name', 'LINK']

df.to_csv(file, index=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM