简体   繁体   English

使用 pd.concat 代替 df.append

[英]Use pd.concat instead of df.append

I have some code I would like to improve.我有一些我想改进的代码。 Firstly because it's pretty slow and secondly because append is going to be deprecated.首先是因为它非常慢,其次是因为 append 将被弃用。 I have this code, and I would like to use concat instead of append for the reasons I mentioned, but after checking several similar questions on stack overflow I haven't figured a way around it for my own code.我有这段代码,出于我提到的原因,我想使用 concat 而不是 append ,但是在检查了几个关于堆栈溢出的类似问题后,我还没有为自己的代码找到解决方法。 I am sure it has a simple solution, but I just can't find it.我确信它有一个简单的解决方案,但我就是找不到。 I would appreciate any help a lot.我将不胜感激。 Thanks in advance!提前致谢!

import time
from time import sleep
# IMPORTAR LIBRERÍA EXCEL Y MÓDULO SISTEMA
import os
import csv
import pandas as pd
import pandas
import openpyxl
import warnings

with warnings.catch_warnings(record=True):
    warnings.simplefilter("always")
# LIBRERÍA ITERACIÓN CARPETAS
from pathlib import Path

# DE CADA ARCHIVO EXCEL EXISTENTE EN EL DIRECTORIO, BORRA LAS COLUMNAS 1-15   
INPUT_DIR = Path.cwd() / r"C:\Users\param\OneDrive\Documents\Automat Consumos\Excels Descargas"
for file in list(INPUT_DIR.rglob("*.xls*")):
    df = pd.read_excel(file)
    if len(df. index) >12:
        df = df.drop([0,1,2,3,4,5,6,7,8,9,10,11,12], axis = 0)
        df.to_excel(file, engine="openpyxl", header = False, index = False)
    else:
        os.remove(file)

df = pd.DataFrame() 
for file in list(INPUT_DIR.rglob("*.xls*")):
    df = df.append(pd.read_excel(file), ignore_index=True)
    df.to_excel(r"C:\Users\param\OneDrive\Documents\Automat Consumos\Excels Combinados\Final Sin Etiquetas\EXCEL DEFINITIVO TOTAL.xlsx", engine="openpyxl", index = False)

Given your question refers to a specific part of the code, replacing the append() with concat() .鉴于您的问题是指代码的特定部分, append()替换为concat() I see you are outputting an excel which is getting overwritten after every iteration this is (probably) a mistake and very inefficient as well.我看到你正在输出一个 excel,它在每次迭代后都会被覆盖,这(可能)是一个错误,而且效率也很低。 This part of the code:这部分代码:

df = pd.DataFrame() 
for file in list(INPUT_DIR.rglob("*.xls*")):
    df = df.append(pd.read_excel(file), ignore_index=True)
    df.to_excel(r"C:\Users\param\OneDrive\Documents\Automat Consumos\Excels Combinados\Final Sin Etiquetas\EXCEL DEFINITIVO TOTAL.xlsx", engine="openpyxl", index = False)

Can be replaced with:可以替换为:

output = pd.concat([pd.read_excel(x,ignore_index=True) for x in list(INPUT_DIR.rglob("*.xls*")])
output.to_excel("path",engine="openpyxl",index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM