简体   繁体   中英

Export data from Python to an excel sheet

After several researches, I'm unable to export my scraped data to an excel sheet correctly...

That is my code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys 
from selenium.webdriver import ActionChains
import time 
from time import sleep
import pandas as pd
import csv
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options


PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)

driver.get('https://www.xxx')
time.sleep(16)

boutons = driver.find_elements_by_class_name('fiche-detail')
for x in boutons:
        if x != boutons[0]:
            x.click()
            time.sleep(0.10)
        else:
            continue

cabinets = driver.find_elements_by_class_name("fiche")
for cabinet in cabinets:
    print(cabinet.text)

codePostaux = driver.find_elements_by_class_name("fiche-info ville-codepostal")
for codePostal in codePostaux:
    print(codePostal.text)

mails = driver.find_elements_by_class_name("fiche-info email")
for mail in mails:
    print(mail.text)

rues = driver.find_elements_by_class_name("fiche-info rue") 
for rue in rues:
    print(rue.text)

df.to_excel('EXTRACTION.xlsx', index = False)

I would like to have them organised by columns ['Cabinets'],['mails'],['codePostaux'],and the data results from those classname would be sit under this columns (to make some direct mail after).

EDIT: After some tries and read the documentations about settle a Data frame, I've tried to see what happened and I got this:

"ValueError: Shape of passed values is (3, 5), indices imply (4, 5)"

I forgot to precise that I've scraped thousands of values (firm names, emails, street names, city names, and so on) so I can't specify the names of the values that need to be in a column. I've write "First value", "Second value" like in the exemple but I think I'm wrong.

For exemple I got in my cmd.exe that:

Cabinet des toubibs rue de la chèvre qui danse 75015 PARIS contact@restaurantdelarto.com

I want that "Cabinet des toubibs" comes into the column [cabinets], "rue de la chèvre qui danse" come to the column [rueCabinet], etc...

Also "Cabinet des toubibs" is the result of print(cabinet.text), "rue de la chèvre qui danse" is the result of print(rue.text)

columns = {'cabinets':  ['First value', 'Second value',...],
         'mails': ['First value', 'Second value',...],
         'codePostaux' : ['First value', 'Second value'],
         'villeCabinet' : ['First value', 'Second value'],
         'rueCabinet' : ['First value', 'Second value']
        }

df = pd.DataFrame(columns, columns = ['cabinets','mails','codePostaux','villeCabinet','rueCabinet'], index=['Cab_1','Cab_2','Cab_3','Cab_4'])

df.to_excel('EXTRACTION.xlsx', index = False)

Thanks in advance for your time;)

PS: Sorry if I make some mistakes, english is not my native language

sounds like u want to use pandas, read about how to create a dataframe, add the columns in with the data that u saved then after u organized your dataframe u could use a command called:

nameofdataframe.to_csv("nameforoutput.csv")

use a code like this:

cols = ['Cabinets', 'Mails', 'codePostaux']
df = df[cols]

This will rearrange your columns.

Take care

edit: just saw you didnt setup a DataFrame. Check this out on how to create a DF with pandas: https://datatofish.com/create-pandas-dataframe/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM