简体   繁体   English

python将字符串解析为csv格式

[英]python parsing string to csv format

I have a file containing a line with the following format 我有一个包含以下格式的行的文件

aaa=A;bbb=B;ccc=C AAA = A; BBB = B; CCC = C

I want to convert it to a csv format so the literals on the equation sides will be columns and the semicolon as a row separator. 我想将它转换为csv格式,因此方程式边上的文字将是列,而分号则是行分隔符。 I tried doing something like this 我尝试过做这样的事情

 f = open("aaa.txt", "r")
    with open("ccc.csv", 'w') as csvFile:
        writer = csv.writer(csvFile)
        rows = []
        if f.mode == 'r':
            single = f.readline()
            lns = single.split(";")
            for item in lns:
                rows.append(item.replace("=", ","))
            writer.writerows(rows)
            f.close()
            csvFile.close()

but I am getting each letter as a column so the result looks like : 但我得到每个字母作为一列,所以结果如下:

a,a,a,",",A
b,b,b,",",B
c,c,c,",",C,"

The expected result should look like 预期结果应该是这样的

aaa,A
bbb,B
ccc,C

The parameter to writer.writerows() must be an iterable of rows , which must in turn be iterables of strings or numbers. writer.writerows()的参数必须是可迭代的行 ,而这些又必须是字符串或数字的可迭代。 Since you pass it a list of strings, characters in the strings are treated as separate fields. 由于您传递了一个字符串列表,因此字符串中的字符将被视为单独的字段。 You can obtain the proper list of rows by splitting the line first on ';' 您可以通过先在';'上拆分该行来获取正确的行列表 , then on '=' : ,然后在'='

import csv

with open('in.txt') as in_file, open('out.csv', 'w') as out_file:
    writer = csv.writer(out_file)
    line = next(in_file).rstrip('\n')
    rows = [item.split('=') for item in line.split(';')]
    writer.writerows(rows)

Just write the strings into the target file line by line: 只需将字符串逐行写入目标文件:

import os
f = open("aaa.txt", "r")
with open("ccc.csv", 'w') as csvFile:
    single = f.readline()
    lns = single.split(";")
    for item in lns:
        csvFile.write(item.replace("=", ",") + os.linesep)
f.close()

The output would be: 输出将是:

aaa,A
bbb,B
ccc,C

It helps to interactively execute the commands and print the values, or add debug print in the code (that will be removed or commented when everything works). 它有助于以交互方式执行命令并打印值,或在代码中添加调试打印(当一切正常时将被删除或注释)。 Here you could have seen that rows is ['aaa,A', 'bbb,B', 'ccc,C'] that is 3 strings when it should be three sequences. 在这里你可以看到rows['aaa,A', 'bbb,B', 'ccc,C'] ,当它应该是三个序列时是3个字符串。

As a string is a (read only) sequence of chars writerows uses each char as a field. 由于字符串是一个(只读)字符序列,因此, writerows将每个字符串用作字段。

So you do not want to replace the = with a comma ( , ), but want to split on the equal sign: 所以你不想用逗号( , )替换= ,但想要在等号上拆分:

        ...
        for item in lns:
            rows.append(item.split("=", 1))
        ...

But the csv module requires for proper operation the output file to be opened with newline='' . 但是csv模块需要正确操作输出文件以newline=''打开。

So you should have: 所以你应该:

with open("ccc.csv", 'w', newline='') as csvFile:
    ...

The following 1 line change worked for me: 以下1行更改对我有用:

rows.append(item.split('='))

instead of the existing code 而不是现有的代码

rows.append(item.replace("=", ",")).

That way, I was able to create a list of lists which can easily be read by the writer so that the row list looks like [['aaa', 'A'], ['bbb', 'B'], ['ccc', 'C']] instead of ['aaa,A', 'bbb,B', 'ccc,C'] 这样,我就可以创建一个列表,列表可以很容易地被作者读取,这样row列表看起来像[['aaa', 'A'], ['bbb', 'B'], ['ccc', 'C']]而不是['aaa,A', 'bbb,B', 'ccc,C']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM