简体   繁体   English

合并Excel电子表格的第二列

[英]Combine Second Column of an Excel Spreadsheet

I currently have about 100 excel files each with two columns. 我目前有大约100个excel文件,每个文件有两列。 The first column contains the "headers" and the second column has the values. 第一列包含“标题”,第二列包含值。 They each look something like this: 它们每个看起来都像这样:

Excel电子表格

I want to combine these, so that there is one final excel file that contains the data from ALL of these; 我想将它们结合起来,以便有一个最终的excel文件,其中包含所有这些文件中的数据。 so that it will look something like this (just with way more columns): 这样它将看起来像这样(只是带有更多列):

合并的(应如何看待,但有全部100列)

The other problem also is that not all of the files have those headers in the same order. 另一个问题是,并非所有文件都具有相同顺序的标头。 If you look at that "Combined" picture, for example, you'll see that those two items had common headers. 例如,如果查看该“组合”图片,您将看到这两个项目具有公共标题。 However, in some of the other files, the header orders may be switched. 但是,在某些其他文件中,标题顺序可能会切换。 For instance, "GPU Variant" may be before "GPU Name", etc. 例如,“ GPU Variant”可能在“ GPU Name”之前,等等。

So essentially, here's what I need to do. 所以本质上,这就是我需要做的。 Find a way to combine the second column of all of these spreadsheets, and then find a way to sort them so that they match up the 1st column. 找到一种方法来组合所有这些电子表格的第二列,然后找到一种对它们进行排序的方法,以便它们与第一列匹配。

If there's a way to program a macro to do this, can someone guide me as to how to do it? 如果有一种方法可以编写宏来执行此操作,有人可以指导我如何执行此操作吗? Are there external programs that are already designed to do this? 是否已经设计了用于执行此操作的外部程序? Excel VBA maybe? Excel VBA也许? This is the code that I have right now, but I don't think this addresses it properly: 这是我现在拥有的代码,但是我认为这不能正确解决:

import xlwt
import xlrd
import os
import csv


current_file = xlwt.Workbook()
write_table = current_file.add_sheet('sheet1', cell_overwrite_ok=True)

key_list = [u'GPU Name:', u'GPU Variant:', u'Architecture:', u'Process Size:', u'Transistors:', u'Die Size:', u'Released:']
for title_index, text in enumerate(key_list):
    write_table.write(0, title_index, text)


file_list = ['2874.csv', '2875.csv']

i = 1
for name in file_list:
    data = xlrd.open_workbook(name)

table = data.sheets()[0]
nrows = table.nrows
for row in range(nrows):
    if row == 0:
        continue
    for index, context in enumerate(table.row_values(row)):
        write_table.write(i, index, context)
    i += 1


current_file.save(os.getcwd() + '/result.csv')

Comment : I keep on getting "Missing' in the third, fourth, fifth, etc. column 评论 :我在第三,第四,第五等栏目中不断出现“缺失”

Add the following print(... and Edit your Question to show the Output : 添加以下print(...然后编辑您的问题以显示输出

            for values in csv_reader:
                # Init Header Order
                header_keys.append(values['header'])
                ws.append((values['header'], values['data']))
            print('header_keys:{}'.format(header_keys)
        else:

Question : ... combine the second column of all ... sort them so that they match up the 1st column 问题 :...合并所有第二列...对其进行排序,以使它们与第一列匹配

The following is a csv/openpyxl Solution: 以下是csv/openpyxl解决方案:
Reading n CSV Files aggregate the Second Column sorted like in the First CSV File. 读取n个CSV文件会汇总第二列,其排序方式类似于第一个CSV文件。

from openpyxl import Workbook
import csv

wb = Workbook()
ws = wb.worksheets[0]

header_keys = []
for n, fName in enumerate(['2874.csv', '2875.csv']):
    with open(fName) as fh:
        csv_reader = csv.DictReader(fh, fieldnames=['header', 'data'], delimiter='\t')
        if n == 0:
            for values in csv_reader:
                # Init Header Order
                header_keys.append(values['header'])
                ws.append((values['header'], values['data']))
        else:
            # Read all Data to Dict 
            data = {}
            for values in csv_reader:
                data[values['header']] = values['data']

            # Write all Data in header_keys Order
            column = n + 2
            for row, key in enumerate(header_keys, 1):
                try:
                    ws.cell(row=row, column=column).value = data[key]
                except:
                    print('FAIL: Key "{}" not in Dict data'.format(key))
                    ws.cell(row=row, column=column).value = 'MISSING'

wb.save('result.xlsx')

Tested with Python: 3.4.2 - openpyxl: 2.4.1 - LibreOffice: 4.3.3.2 使用Python测试:3.4.2-openpyxl:2.4.1-LibreOffice:4.3.3.2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM