简体   繁体   English

迭代和拆分excel文件名,并另存为Pandas中的数据框

[英]Iterate and split excel filenames and save as dataframe in Pandas

Say I have a folder folder1 with excel files, their filenames share same structures: city, building name and id , I want save them in dataframe and then excel file. 假设我有一个包含Excel文件的文件夹folder1 ,它们的文件名共享相同的结构: city, building name and id ,我想将它们保存在dataframe中,然后保存为excel文件。 Please note I also need to append other folders' excel filenames in result. 请注意,我还需要在结果中附加其他文件夹的excel文件名。

bj-LG center-101012.xlsx
sh-ABC tower-1010686.xlsx
bj-Jinzhou tower-101018.xlsx
gz-Zijin building-101012.xls
...

The first method I have tried: 我尝试过的第一种方法:

import os
import pandas as pd
from pandas import DataFrame, ExcelWriter

path = os.getcwd()
file = [".".join(f.split(".")[:-1]) for f in os.listdir() if os.path.isfile(f)] #exclude files' extension

city = file.split('-')[0]
projectName = file.split('-')[1]
projectID = file.split('-')[2]
    #print(city)        
df = pd.DataFrame(columns = ['city', 'building name', 'id'])
df['city'] = city
df['building name'] = projectName
df['id'] = projectID    

writer = pd.ExcelWriter("C:/Users/User/Desktop/test.xlsx", engine='xlsxwriter')
df.to_excel(writer, index = False)
writer.save()

Problem: 问题:

Traceback (most recent call last):

  File "<ipython-input-203-c09878296e72>", line 9, in <module>
    city = file.split('-')[0]

AttributeError: 'list' object has no attribute 'split'

My second method: 我的第二种方法:

for root, directories, files in os.walk(path):
    #print(root)
    for file in files:
        if file.endswith('.xlsx') or file.endswith('.xls'):
            #print(file)            
            city = file.split('-')[0]
            projectName = file.split('-')[1]
            projectID = file.split('-')[2]
            #print(city)        
    df = pd.DataFrame(columns = ['city', 'building name', 'id'])
    df['city'] = city
    df['building name'] = projectName
    df['id'] = projectID    

    writer = pd.ExcelWriter("C:/Users/User/Desktop/test.xlsx", engine='xlsxwriter')
    df.to_excel(writer, index = False)
    writer.save()

I got an empty test.xlsx file, how could I make it works? 我有一个空的test.xlsx文件,如何使它工作? Thanks. 谢谢。

Method 2 is close. 方法2关闭。

You need to create the dataframe before the for loops. 您需要在for循环之前创建数据框。 After your variable assignments, make a dictionary of the variables and append it to the dataframe. 分配变量后,制作一个变量字典,并将其附加到数据框。 There is also probably a better way to find your file list using glob, but i will just work with what you have already done. 还有可能是一种使用glob查找文件列表的更好方法,但是我将处理您已经完成的工作。

df = pd.DataFrame()
for root, directories, files in os.walk(path):

    for file in files:
        if file.endswith('.xlsx') or file.endswith('.xls'):
            #print(file)            
            city = file.split('-')[0]
            projectName = file.split('-')[1]
            projectID = file.split('-')[2]
            #append data inside inner loop
            d = {'city':city, 'building name':projectname, 'id':projectID}
            df.append(d)


writer = pd.ExcelWriter("C:/Users/User/Desktop/test.xlsx", engine='xlsxwriter')
df.to_excel(writer, index = False)
writer.save()

This splits off the file extension, then unpacks the split into the vairables. 这将拆分文件扩展名,然后将拆分的文件解压缩到vairable中。 Creates a dictionary then appends the dictionary to the dataframe. 创建一个字典,然后将字典追加到数据框。

files = [
    "bj-LG center-101012.xlsx",
    "sh-ABC tower-1010686.xlsx",
    "bj-Jinzhou tower-101018.xlsx",
    "gz-Zijin building-101012.xls"]

df = pd.DataFrame()
for file in files:
    filename = file.split(".")[0]
    city, projectName, projectID = filename.split("-")
    d = {'city':city,'projectID':projectID,'projectName':projectName}


    df = df.append(d,ignore_index=True)

df.to_excel('summary.xlsx')

This should works, thanks to the hint of use glob from @Dan Wisner 由于@Dan Wisner提供的使用glob的提示,这应该可以工作

import os
from glob import glob

fileNames = [os.path.splitext(val)[0] for val in glob('*.xlsx') or glob('*.xls')]

df = pd.DataFrame({'fileNames': fileNames})
df[['city', 'name', 'id']] = df['fileNames'].str.split('-', n=2, expand=True)

del df['fileNames']

writer = pd.ExcelWriter("C:/Users/User/Desktop/test1.xlsx", engine='xlsxwriter')
df.to_excel(writer, index = False)
writer.save()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM