简体   繁体   English

将特定列写入输出文件然后在Excel中打开时出现Python CSV格式问题

[英]Python CSV formatting issue when writing specific columns to output file then opening in Excel

The Problem 问题

I have a CSV file that contains a large number of items. 我有一个包含大量项目的CSV文件。

The first column can contain either an IP address or random garbage. 第一列可以包含IP地址或随机垃圾。 The only other column I care about is the fourth one. 我唯一关心的另一个专栏是第四个专栏。

I have written the below snippet of code in an attempt to check if the first column is an IP address and, if so, write that and the contents of the fourth column to another CSV file side by side. 我编写了下面的代码片段,试图检查第一列是否是IP地址,如果是,则将该内容和第四列的内容并排写入另一个CSV文件。

with open('results.csv','r') as csvresults:
    filecontent = csv.reader(csvresults)
    output = open('formatted_results.csv','w')
    processedcontent = csv.writer(output)

    for row in filecontent:
        first = str(row[0])
        fourth = str(row[3])
        if re.match('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', first) != None:
            processedcontent.writerow(["{},{}".format(first,fourth)])
        else:
            continue
    output.close()

This works to an extent. 这在一定程度上起作用。 However, when viewing in Excel, both items are placed in a single cell rather than two adjacent ones. 但是,在Excel中查看时,两个项目都放在一个单元格中,而不是两个相邻的单元格中。 If I open it in notepad I can see that each line is wrapped in quotation marks. 如果我在记事本中打开它,我可以看到每一行都用引号括起来。 If these are removed Excel will display the columns properly. 如果删除这些,Excel将正确显示列。

Example Input 示例输入

1.2.3.4,rubbish1,rubbish2,reallyimportantdata

Desired Output 期望的输出

1.2.3.4    reallyimportantdata - two separate columns

Actual Output 实际产出

"1.2.3.4,reallyimportantdata" - single column

The Question 问题

Is there any way to fudge the format part to not write out with quotations? 有没有办法捏造format部分不写出引用? Alternatively, what would be the best way to achieve what I'm trying to do? 或者,什么是实现我想要做的最好的方法?

I've tried writing out to another file and stripping the lines but, despite not throwing any errors, the result was the same... 我已经尝试写出另一个文件并剥离行,但是,尽管没有抛出任何错误,结果是相同的......

writerow() takes a list of elements and writes each of those into a column. writerow()获取元素列表并将每个元素写入一列。 Since you are feeding a list with only one element, it is being placed into one column. 由于您只为一个列表提供一个元素,因此它将被放入一列中。

Instead, feed writerow() a list: 相反,为writerow()一个列表:

processedcontent.writerow([first,fourth])

Have you considered using Pandas? 你考虑过使用熊猫吗?

import pandas as pd

df = pd.read_csv("myFile.csv", header=0, low_memory=False, index_col=None)
fid = open("outputp.csv","w")
for index, row in df.iterrows():
    aa=re.match(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$",row['IP'])
    if aa:
        tline = '{0},{1}'.format(row['IP'], row['fourth column'])
        fid.write(tline)
output.close()

There may be an error or two and I got the regex from here . 可能有一两个错误,我从这里得到了正则表达式。 This assumes the first row of the csv has titles which can be referenced. 这假设csv的第一行具有可以引用的标题。 If it does not then you can use header = None and reference the columns with iloc 如果没有,则可以使用header = None并使用iloc引用列

Come to think of it you could probably run the regex on the dataFrame, copy the first and fourth column to a new dataFrame and use the to_csv method in pandas. 想想你可以在dataFrame上运行正则表达式,将第一列和第四列复制到新的dataFrame并在pandas中使用to_csv方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM