简体   繁体   English

使用 Python 使用 openpyxl 将文本转换为 Excel 中的列

[英]Text to Columns in Excel using Python using openpyxl

I am trying to do the "text-to-columns" feature from Excel through Python using openpyxl.我正在尝试使用 openpyxl 通过 Python 从 Excel 执行“文本到列”功能。 The file I have is currently saved as a .xlsx.我拥有的文件当前保存为 .xlsx。 I cannot use the split() feature because my data is numbers not words.我不能使用 split() 功能,因为我的数据是数字而不是单词。 I have tried pandas but it does not work.我尝试过熊猫,但它不起作用。 I ran into the problem of having to download xldr but I cannot because I am using an older version of python due to a sdk I need.我遇到了必须下载 xldr 的问题,但我不能,因为由于我需要一个 sdk,我使用的是旧版本的 python。 Is there a way to do text-to-columns in python using openpyxl when the data is numbers?当数据是数字时,有没有办法在 python 中使用 openpyxl 进行文本到列? I am using Python version 2.7.18.我正在使用 Python 2.7.18 版。

I would like to open the file from my directory, grab the column that all the data's in (not by name but by cell [ex. cell A]), delimit the number data by semicolons, then save the file.我想从我的目录中打开文件,获取所有数据所在的列(不是按名称,而是按单元格 [例如单元格 A]),用分号分隔数字数据,然后保存文件。

Here is the Data file: Excel Data这是数据文件: Excel数据

Here is the code I have:这是我的代码:

text-to-column Code picture文本到列代码图片

text-to-column Code doc文本到列代码文档

Thank you!谢谢!

I am not entirely sure what you are trying to do.我不完全确定您要做什么。 But, based on the code and the heading, my understanding is that you want to read ALL files in a particular directory (all files in that folder should be excel and should have ; separated single column of data in column A), convert the text to column and then write back to the same file.但是,根据代码和标题,我的理解是您要读取特定目录中的所有文件(该文件夹中的所有文件都应该是 excel 并且应该有;在 A 列中分隔单列数据),转换文本列,然后写回同一个文件。

So, below code will do this:因此,下面的代码将执行此操作:

  1. Go to a specified directory ('C:\Users\potomis1\PycharmProjects\MUSE2022`)转到指定目录('C:\Users\potomis1\PycharmProjects\MUSE2022`)
  2. Read all files there into excelFiles array将那里的所有文件读入excelFiles数组
  3. Read each file into dataframe, split the single column to multiple columns based on ;将每个文件读入数据框,根据 ; 将单列拆分为多列;
  4. Write back to the same file写回同一个文件
  5. This is looped through for each file in directory这是循环遍历目录中的每个文件
import openpyxl
import os
import sys
import pandas as pd

# Main
if __name__ == "__main__":
    # This is the FOLDER where all your excel files are...
    # Two back slashes as backslash is escape character
    filePath = 'C:\\Users\\potomis1\\PycharmProjects\\MUSE2022\\'

    # Go inside the folder
    os.chdir(filePath)

    # Get the list of Excel files inside the folder
    excelFiles = os.listdir('.')
    # For each Excel file
    for i in range(0, len(excelFiles)):
        df = pd.read_excel(excelFiles[i], header=None)
        # Code to separate data
        df = df[0].str.split(';',expand=True)
        # Save and close the workbook
        df.to_excel(excelFiles[i], header=None, index=False)
        print(excelFiles[i] + ' sorted.')

    # Code finishes, close the program - NOT REQUIRED
#    sys.exit()

Update using openpyxl instead of read_excel使用 openpyxl 而不是 read_excel 更新

import openpyxl
from openpyxl.utils.dataframe import dataframe_to_rows
import os
import sys
import pandas as pd


# Main
if __name__ == "__main__":
    # ??? Two back slashes as backslash is escape character
    filePath = 'C:\\Users\\potomis1\\PycharmProjects\\MUSE2022\\'

    # Go inside the folder
    os.chdir(filePath)

    # Get the list of Excel files inside the folder
    excelFiles = os.listdir('.')
    print('ExcelFiles ', excelFiles)
    # For each Excel file
    for i in range(0, len(excelFiles)):
        # Code to separate data
        Work_Book = openpyxl.load_workbook(filename=excelFiles[i])
        Work_Sheet = Work_Book.active
        df = pd.DataFrame(list(Work_Sheet.values))
        df = df[0].str.split(';',expand=True)

        # Save and close the workbook
        rows = dataframe_to_rows(df, index=False, header=None)

        for r_idx, row in enumerate(rows, 1):
            for c_idx, value in enumerate(row, 1):
                Work_Sheet.cell(row=r_idx, column=c_idx, value=value)
        Work_Book.save(filename=excelFiles[i])    
        print(excelFiles[i] + ' sorted.')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM