简体   繁体   中英

python pandas rename data frame

The purpose of this code is to scrape a bunch of data tables with different lengths (different number of rows per table), turn them into pandas data frames, remove some unnecessary columns and fix the date.

All the above works ok but when I tried to rename a column I got an error.

Here is data sample:

Date Actual

0 Oct 15, 2018 21:30

1 Sep 09, 2018 21:30 0.7%

2 Aug 08, 2018 21:30 0.3%

3 Jul 09, 2018 21:30 -0.1%

4 Jun 08, 2018 21:30 -0.2%

5 May 09, 2018 21:30 -0.2%

6 Apr 10, 2018 21:30 -1.1%

Here is the code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd


class DataEngine:
    def __init__(self):
        self.urls = open(r"C:\Users\Sayed\Desktop\script\sample.txt").readlines()
        self.driver = webdriver.Chrome(r"D:\Projects\Tutorial\Driver\chromedriver.exe")
        self.wait = WebDriverWait(self.driver, 10)

    def title(self):
        names = []
        for url in self.urls:
            self.driver.get(url)
            title = self.driver.find_element_by_xpath('//*[@id="leftColumn"]/h1').text
            if title not in names:
                names.append(title)
        return names

    def table(self):
        DataFrames = []
        for url in self.urls:
            self.driver.get(url)
            while True:
                try:
                    item = self.wait.until(
                        ec.visibility_of_element_located((By.XPATH, '//*[contains(@id,"showMoreHistory")]/a')))
                    self.driver.execute_script("arguments[0].click();", item)
                except Exception:
                    break

            df = pd.DataFrame(columns=['Release Date', 'Time', 'Actual', 'Forecast', 'Previous'])
            pos = 0
            for table in self.wait.until(
                    ec.visibility_of_all_elements_located((By.XPATH, '//*[contains(@id,"eventHistoryTable")]//tr'))):
                data = [item.text for item in table.find_elements_by_xpath(".//*[self::td]")]
                if data:
                    df.loc[pos] = data[0:5]
                    pos += 1

            df["Date"] = df["Release Date"].apply(lambda date: date[:12]) + " " + df["Time"]
            df.astype('unicode')
            df = df[['Date', 'Actual', 'Forecast', 'Previous', 'Release Date', 'Time']]
            pd.to_datetime(df['Date'], format='%b %d, %Y %H:%M')

            df.drop(df.columns[-1], axis=1, inplace=True)
            df.drop(df.columns[-1], axis=1, inplace=True)
            df.drop(df.columns[-1], axis=1, inplace=True)
            df.drop(df.columns[-1], axis=1, inplace=True)
            df = df.reset_index()
            if df not in DataFrames:
                DataFrames.append(df)
        return DataFrames

    def rename(self):
        tabels = self.table()
        names = self.title()
        for tabel, name in zip(tabels, names):
            tabel.rename({'Actual': name})



x = DataEngine()
x.rename()

Here is the error:

Traceback (most recent call last):

File "D:/Projects/Tutorial/database.py", line 67, in x.rename()

File "D:/Projects/Tutorial/database.py", line 59, in rename tabels = self.table()

File "D:/Projects/Tutorial/database.py", line 54, in table if df not in DataFrames:

File "C:\\Users\\Sayed\\Anaconda3\\lib\\site-packages\\pandas\\core\\ops.py", line 1613, in f

raise ValueError('Can only compare identically-labeled '

ValueError: Can only compare identically-labeled DataFrame objects

First of all, your multiple calls to df.drop are unnecessary, and makes reading the code more visually tiring. Change:

df.drop(df.columns[-1], axis=1, inplace=True)
df.drop(df.columns[-1], axis=1, inplace=True)
df.drop(df.columns[-1], axis=1, inplace=True)
df.drop(df.columns[-1], axis=1, inplace=True)
df = df.reset_index()

To:

df = df.drop(df.columns[-4:], axis=1).reset_index(drop=True)

I've added the drop=True because I doubt you need it anymore.

Second, to actually answer your question, your problem is actually occurring on the line where you call if df not in DataFrames . On this line, Python is effectively returning:

not any(all(df == df_i) for df_i in DataFrames)

when it evaluates df not in DataFrames .

This comparison will fail (specifically at df == df_i ) if either DataFrame doesn't have the same columns or same index .

Try putting in something like the following:

try:
    if df not in DataFrames:
        DataFrames.append(df)
except ValueError:
    for df_i in DataFrames:
        print(df.columns)
        print(df.columns == df_i.columns)
        print(df.index == df_i.index)

Likely you'll find discrepancies.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM