简体   繁体   English

Python--读取 dat 文件行,重写为 Excel 中的列。 csv/numpy/openpyxl

[英]Python--Read dat file rows, rewrite to columns in Excel. csv/numpy/openpyxl

I run into some problem with using csv/numpy/openpyxl, the problem is I have a .dat file, in我在使用 csv/numpy/openpyxl 时遇到了一些问题,问题是我有一个 .dat 文件,在

a,a,a,a
b,b,b,b
c,c,c,c

I want to take each row of dat file, put it into one column per excel, meaning我想把 dat 文件的每一行,放到每个 excel 的一列中,意思是

excel file: excel文件:

a b c
a b c
a b c

here is what I got to so far:这是我到目前为止所做的:

import csv
import openpyxl
import numpy as np


wb = openpyxl.Workbook()
ws = wb.active

with open('Shari10.dat') as f:
    dat_reader = csv.reader(f, delimiter = ",")

    for header in csv.reader(f):
        break

    for dat_line in f:
        line = dat_line.split(",")

        data = np.vstack(line[1:8])

        for row in data:
            ws.append(row)
            print(row)
        #wb.save("coffee.xlsx")

here is the error:这是错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-a07e6ac6842f> in <module>
     20         print(data)
     21         for row in data:
---> 22             ws.append(row)
     23         #wb.save("coffee.xlsx")

~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in append(self, iterable)
    665 
    666         else:
--> 667             self._invalid_row(iterable)
    668 
    669         self._current_row = row_idx

~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in _invalid_row(self, iterable)
    792     def _invalid_row(self, iterable):
    793         raise TypeError('Value must be a list, tuple, range or generator, or a dict. Supplied value is {0}'.format(
--> 794             type(iterable))
    795                         )
    796 

TypeError: Value must be a list, tuple, range or generator, or a dict. Supplied value is <class 'str'>

For reference, i was trying to do this:作为参考,我试图这样做:

data = [
         ['A', 100, 1.0],
         ['B', 200, 2.0],
         ['C', 300, 3.0],    
         ['D', 400, 4.0],        
 ]
for row in data:
    ws.append(row)

Meanwhile, I just started to learn python, so forgive my messy code structure, as for the grammar, i am trying to write as accurate as possible instead of shorten the code.同时,我刚开始学习python,所以请原谅我凌乱的代码结构,至于语法,我尽量写得尽可能准确,而不是缩短代码。

It looks like you're having some issues with numpy arrays not being a list.看起来您在 numpy 数组不是列表方面遇到了一些问题。 You can fix that by using numpy's tolist() method by changing this您可以通过更改此使用 numpy 的tolist()方法来解决此问题

for row in data:
    ws.append(row)
    print(row)

to this对此

for row in data:
    ws.append(row.tolist())
    print(row.tolist())

Just changing those lines will make the code run successfully, but it does not provide your desired output.仅更改这些行将使代码成功运行,但它不会提供您想要的输出。 Running the code with the input file使用输入文件运行代码

a,a,a,a
b,b,b,b
c,c,c,c

results in a spreadsheet that looks like this, because you are transposing each row array into a column array, then stacking the columns on top of each other ( ws.append adds rows to the bottom of your worksheet)结果是一个看起来像这样的电子表格,因为您将每个行数组转换为一个列数组,然后将列堆叠在一起( ws.append将行添加到工作表的底部)

b
b
b
b\n
c
c
c
c\n

If you want the entire csv (including the header) to be transposed, a simple way to do that is with numpy's transpose method.如果您希望整个 csv(包括标题)被转置,一个简单的方法是使用 numpy 的transpose方法。 This method will swap the entire array for you, and then you can iterate through every row to write each of them to the worksheet.此方法将为您交换整个数组,然后您可以遍历每一行以将它们中的每一个写入工作表。 This will simplify how you read in the csv file to be like below.这将简化您在 csv 文件中读取的方式,如下所示。 Keep in mind transpose only works with square arrays, so I've added a bit of code to square any jagged arrays.请记住, transpose仅适用于方形数组,因此我添加了一些代码来对任何锯齿状数组进行平方。

import openpyxl
import numpy as np

# Create 
wb = openpyxl.Workbook()
ws = wb.active

with open('input.dat') as f:
    # Read in all the data
    data = list(csv.reader(f))

    ## If your CSV isn't square, you need to square it first
    # Get longest row in array
    longest = len(max(data, key=len))
    # Pad every row to longest row length
    for row in data:
        row.extend( (longest - len(row))*[''])

    ## Once data is square, continue as normal
    # Transpose the array
    data = np.transpose(data)

    # Write all rows to worksheet
    for row in data:
        ws.append(row.tolist())

# Save worksheet
wb.save('test.xlsx')

Let's say we have a file example.dat with the following:假设我们有一个文件 example.dat,其中包含以下内容:

a1,a2,a3,a4
b1,b2,b3,b4
c1,c2,c3,c4

This is better done with pandas .这最好用pandas来完成。 First load the data as a dataframe , then take the transpose and save the resulting dataframe in an excel file like this:首先将数据作为数据帧加载,然后进行转置并将结果数据帧保存在一个excel文件中,如下所示:

import pandas as pd

df_in = pd.read_csv("example.dat", header = None) # header = False since the data has no header.

data_out = df_in.transpose()

data_out.to_excel("example.xlsx", index = False, header = False) # index and header False since you don't want row or column indices written to the excel file.

Output:输出:

a1  b1  c1
a2  b2  c2
a3  b3  c3
a4  b4  c4

Pros: Simple and clean.优点:简单干净。 Cons: This implementation needs openpyxl缺点:这个实现需要openpyxl

Install as: pip install openpyxl安装为: pip install openpyxl

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM