[英]Python--Read dat file rows, rewrite to columns in Excel. csv/numpy/openpyxl
I run into some problem with using csv/numpy/openpyxl, the problem is I have a .dat file, in我在使用 csv/numpy/openpyxl 时遇到了一些问题,问题是我有一个 .dat 文件,在
a,a,a,a
b,b,b,b
c,c,c,c
I want to take each row of dat file, put it into one column per excel, meaning我想把 dat 文件的每一行,放到每个 excel 的一列中,意思是
excel file: excel文件:
a b c
a b c
a b c
here is what I got to so far:这是我到目前为止所做的:
import csv
import openpyxl
import numpy as np
wb = openpyxl.Workbook()
ws = wb.active
with open('Shari10.dat') as f:
dat_reader = csv.reader(f, delimiter = ",")
for header in csv.reader(f):
break
for dat_line in f:
line = dat_line.split(",")
data = np.vstack(line[1:8])
for row in data:
ws.append(row)
print(row)
#wb.save("coffee.xlsx")
here is the error:这是错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-17-a07e6ac6842f> in <module>
20 print(data)
21 for row in data:
---> 22 ws.append(row)
23 #wb.save("coffee.xlsx")
~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in append(self, iterable)
665
666 else:
--> 667 self._invalid_row(iterable)
668
669 self._current_row = row_idx
~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in _invalid_row(self, iterable)
792 def _invalid_row(self, iterable):
793 raise TypeError('Value must be a list, tuple, range or generator, or a dict. Supplied value is {0}'.format(
--> 794 type(iterable))
795 )
796
TypeError: Value must be a list, tuple, range or generator, or a dict. Supplied value is <class 'str'>
For reference, i was trying to do this:作为参考,我试图这样做:
data = [
['A', 100, 1.0],
['B', 200, 2.0],
['C', 300, 3.0],
['D', 400, 4.0],
]
for row in data:
ws.append(row)
Meanwhile, I just started to learn python, so forgive my messy code structure, as for the grammar, i am trying to write as accurate as possible instead of shorten the code.同时,我刚开始学习python,所以请原谅我凌乱的代码结构,至于语法,我尽量写得尽可能准确,而不是缩短代码。
It looks like you're having some issues with numpy arrays not being a list.看起来您在 numpy 数组不是列表方面遇到了一些问题。 You can fix that by using numpy's
tolist()
method by changing this您可以通过更改此使用 numpy 的
tolist()
方法来解决此问题
for row in data:
ws.append(row)
print(row)
to this对此
for row in data:
ws.append(row.tolist())
print(row.tolist())
Just changing those lines will make the code run successfully, but it does not provide your desired output.仅更改这些行将使代码成功运行,但它不会提供您想要的输出。 Running the code with the input file
使用输入文件运行代码
a,a,a,a
b,b,b,b
c,c,c,c
results in a spreadsheet that looks like this, because you are transposing each row array into a column array, then stacking the columns on top of each other ( ws.append
adds rows to the bottom of your worksheet)结果是一个看起来像这样的电子表格,因为您将每个行数组转换为一个列数组,然后将列堆叠在一起(
ws.append
将行添加到工作表的底部)
b
b
b
b\n
c
c
c
c\n
If you want the entire csv (including the header) to be transposed, a simple way to do that is with numpy's transpose
method.如果您希望整个 csv(包括标题)被转置,一个简单的方法是使用 numpy 的
transpose
方法。 This method will swap the entire array for you, and then you can iterate through every row to write each of them to the worksheet.此方法将为您交换整个数组,然后您可以遍历每一行以将它们中的每一个写入工作表。 This will simplify how you read in the csv file to be like below.
这将简化您在 csv 文件中读取的方式,如下所示。 Keep in mind
transpose
only works with square arrays, so I've added a bit of code to square any jagged arrays.请记住,
transpose
仅适用于方形数组,因此我添加了一些代码来对任何锯齿状数组进行平方。
import openpyxl
import numpy as np
# Create
wb = openpyxl.Workbook()
ws = wb.active
with open('input.dat') as f:
# Read in all the data
data = list(csv.reader(f))
## If your CSV isn't square, you need to square it first
# Get longest row in array
longest = len(max(data, key=len))
# Pad every row to longest row length
for row in data:
row.extend( (longest - len(row))*[''])
## Once data is square, continue as normal
# Transpose the array
data = np.transpose(data)
# Write all rows to worksheet
for row in data:
ws.append(row.tolist())
# Save worksheet
wb.save('test.xlsx')
Let's say we have a file example.dat with the following:假设我们有一个文件 example.dat,其中包含以下内容:
a1,a2,a3,a4
b1,b2,b3,b4
c1,c2,c3,c4
This is better done with pandas .这最好用pandas来完成。 First load the data as a dataframe , then take the transpose and save the resulting dataframe in an excel file like this:
首先将数据作为数据帧加载,然后进行转置并将结果数据帧保存在一个excel文件中,如下所示:
import pandas as pd
df_in = pd.read_csv("example.dat", header = None) # header = False since the data has no header.
data_out = df_in.transpose()
data_out.to_excel("example.xlsx", index = False, header = False) # index and header False since you don't want row or column indices written to the excel file.
Output:输出:
a1 b1 c1
a2 b2 c2
a3 b3 c3
a4 b4 c4
Pros: Simple and clean.优点:简单干净。 Cons: This implementation needs openpyxl
缺点:这个实现需要openpyxl
Install as: pip install openpyxl
安装为:
pip install openpyxl
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.