簡體   English   中英

將數據框寫入不同的 Excel 表格

[英]Writing Dataframes to Different excel Sheets

我正在網上抓取一些數據並將其寫入大約 6 個數據幀。 然后,我想將這些數據幀中的每一個寫入 Excel 文件中的單獨工作表。 我在網上查看並嘗試了兩種不同的方法,但無法獲得我想要的結果。 如果我使用以下代碼,它只會將最后一個數據幀寫入 excel,而其他所有內容都會被覆蓋:

book = "Sample.xlsx"
rb = openpyxl.load_workbook(book)
rb.create_sheet(pitches[x] + ' Data')
activeSheet = pitches[x] + ' Data'
writer = pd.ExcelWriter(book, engine='xlsxwriter')
combinedDF.to_excel(writer, sheet_name=activeSheet,  index=False)
writer.save()

如果我使用以下代碼部分,它會創建每個單獨的工作表,但不會將數據幀數據寫入 excel 文件:

book = "Sample.xlsx"
rb = openpyxl.load_workbook(book)
rb.create_sheet(pitches[x] + ' Data')
activeSheet = pitches[x] + ' Data'
combinedDF.to_excel(book, sheet_name=activeSheet,  index=False)
rb.save(book)

這是完整的代碼:

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys
import time
from bs4 import BeautifulSoup
import requests
import pandas as pd
import openpyxl
book = "Baseball Savant Data.xlsx"
rb = openpyxl.load_workbook(book)
pitches = ['Fastball', '2 Seam Fastball', 'Cut Fastball', 'Split-Finger 
Fastball', 'Sinker', 'Slider', 'Changeup', 'Curveball']


beginningTime = time.time()
browser = webdriver.Chrome()
browser.get('http://www.baseballsavant.com')
browser.maximize_window()
linkPage = browser.find_element_by_link_text('Statcast Search')
linkPage.click()
time.sleep(2)
myMinimumPitchCount = browser.find_element_by_xpath("""//*
[@id="min_pitches"]/option[@value='500']""").click()

myMinimumResultCount= browser.find_element_by_xpath("""//*
[@id="min_results"]/option[@value='50']""").click()

pitchCode = ['FF','FT','FC','FS','SI','SL','CH','CU']
time.sleep(2)
x = 0
y = 0


while x < len(pitchCode):
    if x == 0:
        current = ('chk_PT_' + pitchCode[x])
        pitchSelection = browser.find_element_by_class_name("mock-pulldown-
container")
        pitchSelection.click()
        currentPitch = browser.find_element_by_id(current).click()
        searchButton = browser.find_element_by_xpath("""//*
[@id="pfx_form"]/div[2]/div/input[1]""").click()
        time.sleep(3)

        while y < 2:
            if y == 0:
                currentURL = browser.current_url
                r = requests.get(currentURL)
                soup=BeautifulSoup(r.text, "html.parser")
                table_headers_data = soup.find("table", {"id" : 
"search_results"})
                statistics = soup.findAll("tr", {"class" : "search_row"})

                table_headers = [th.text.strip() for th in 
table_headers_data.findAll('th')[0:5]]
                data_rows = statistics[:]
                player_data = [[td.text.strip() for td in 
data_rows[i].findAll('td')[0:5]]
                    for i in range(len(data_rows))]

                dfPitchCount = pd.DataFrame(player_data, index=None, 
columns=table_headers)
                print('Y = ' + str(y))
                y+=1


            elif y != 0:
                wOBAAllowed = browser.find_element_by_xpath("""//*
[@id="sort_col"]/option[@value='woba']""").click()
                searchButton = browser.find_element_by_xpath("""//*
[@id="pfx_form"]/div[2]/div/input[1]""").click()
                time.sleep(2)
                currentURL = browser.current_url
                r = requests.get(currentURL)
                soup=BeautifulSoup(r.text, "html.parser")
                table_headers_data = soup.find("table", {"id" : 
"search_results"})
                statistics = soup.findAll("tr", {"class" : "search_row"})


                table_headers = [th.text.strip() for th in 
table_headers_data.findAll('th')[0:4]]

                data_rows = statistics[:]
                player_data = [[td.text.strip() for td in 
data_rows[i].findAll('td')[0:4]]
                   for i in range(len(data_rows))]

                dfwOBA = pd.DataFrame(player_data, index=None, 
columns=table_headers)
                combinedDF = pd.merge(dfPitchCount, dfwOBA, how='left', 
on="Player", sort=False, indicator = "True")
                print(rb.get_sheet_names())

                rb.create_sheet(pitches[x] + ' Data')
                activeSheet = pitches[x] + ' Data'
                writer = pd.ExcelWriter(book, engine='xlsxwriter')
                combinedDF.to_excel(writer, sheet_name=activeSheet, 
index=False )
                writer.save()
                pitchSort = browser.find_element_by_xpath("""//*
[@id="sort_col"]/option[@value='pitches']""").click()
                print('Y = ' + str(y))
                y+=1
                print('this is ' + str(x))
                x+=1


    elif x != 0:
        y=0
        print('y boogers = ' + str(y))
        pitchSelection = browser.find_element_by_class_name("mock-pulldown-
container")
        pitchSelection.click()
        time.sleep(5)
        current = ('chk_PT_' + pitchCode[x])
        previous = ('chk_PT_' + pitchCode[x-1])
        previousPitch = browser.find_element_by_id(previous)
        previousPitch.click()
        time.sleep(1)
        print(current)        
        pitchSelection.click()
        currentPitch = browser.find_element_by_id(current)
        currentPitch.click()
        time.sleep(1)
        print(previous)
        pitchSort = browser.find_element_by_xpath("""//*
[@id="sort_col"]/option[@value='pitches']""").click()
        searchButton = browser.find_element_by_xpath("""//*
[@id="pfx_form"]/div[2]/div/input[1]""").click()

        while y < 2:
            if y == 0:
                currentURL = browser.current_url
                r = requests.get(currentURL)
                soup=BeautifulSoup(r.text, "html.parser")
                table_headers_data = soup.find("table", {"id" : 
"search_results"})
                statistics = soup.findAll("tr", {"class" : "search_row"})

                table_headers = [th.text.strip() for th in 
table_headers_data.findAll('th')[0:5]]
                data_rows = statistics[:]
                player_data = [[td.text.strip() for td in 
data_rows[i].findAll('td')[0:5]]
                    for i in range(len(data_rows))]

                dfPitchCount = pd.DataFrame(player_data, index=None, 
columns=table_headers)

                y+=1

            elif y != 0:
                wOBAAllowed = browser.find_element_by_xpath("""//*
[@id="sort_col"]/option[@value='woba']""").click()
                searchButton = browser.find_element_by_xpath("""//*
[@id="pfx_form"]/div[2]/div/input[1]""").click()
                time.sleep(2)
                currentURL = browser.current_url
                r = requests.get(currentURL)
                soup=BeautifulSoup(r.text, "html.parser")
                table_headers_data = soup.find("table", {"id" : 
"search_results"})
                statistics = soup.findAll("tr", {"class" : "search_row"})


                table_headers = [th.text.strip() for th in 
table_headers_data.findAll('th')[0:4]]

                data_rows = statistics[:]
                player_data = [[td.text.strip() for td in 
data_rows[i].findAll('td')[0:4]]
                   for i in range(len(data_rows))]

                dfwOBA = pd.DataFrame(player_data, index=None, 
columns=table_headers)
                combinedDF = pd.merge(dfPitchCount, dfwOBA, how='left', 
on="Player", sort=False, indicator = "True")
                print(combinedDF)
                print(rb.get_sheet_names())

                rb.create_sheet(pitches[x] + ' Data')
                activeSheet = pitches[x] + ' Data'
                writer = pd.ExcelWriter(book, engine='xlsxwriter')

                combinedDF.to_excel(writer, sheet_name=activeSheet,  
index=False)
                writer.save()
                pitchSort = browser.find_element_by_xpath("""//*
[@id="sort_col"]/option[@value='pitches']""").click()

                y+=1

                x+=1

看來您錯過了最重要的來源: to_excel 的熊貓文檔: https : to_excel

所以,把writer = pd.ExcelWriter(book, engine='xlsxwriter')writer.save()放在循環之外:第一個在開始x循環之前,第二個在它之后:你應該打開並保存 excel只歸檔一次,而不是在每張紙上寫。

ExcelWriter的 pandas 文檔建議“編寫器應用作上下文管理器”。

這是將多個數據框對象寫入 Excel 文件的此類用法示例:

import pandas as pd
dfList = [pd.DataFrame([[i + 1, i + 2, i + 3],['a', 'b', 'c']], columns=['col1', 'col2', 'col3']) for i in range(5)]
with pd.ExcelWriter('Sample.xlsx') as writer:
    for i, df in enumerate(dfList):
        df.to_excel(writer, sheet_name=f'Sheet {i}', index=False)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM