[英]Python CSV formatting issue when writing specific columns to output file then opening in Excel
The Problem 问题
I have a CSV file that contains a large number of items. 我有一个包含大量项目的CSV文件。
The first column can contain either an IP address or random garbage. 第一列可以包含IP地址或随机垃圾。 The only other column I care about is the fourth one.
我唯一关心的另一个专栏是第四个专栏。
I have written the below snippet of code in an attempt to check if the first column is an IP address and, if so, write that and the contents of the fourth column to another CSV file side by side. 我编写了下面的代码片段,试图检查第一列是否是IP地址,如果是,则将该内容和第四列的内容并排写入另一个CSV文件。
with open('results.csv','r') as csvresults:
filecontent = csv.reader(csvresults)
output = open('formatted_results.csv','w')
processedcontent = csv.writer(output)
for row in filecontent:
first = str(row[0])
fourth = str(row[3])
if re.match('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', first) != None:
processedcontent.writerow(["{},{}".format(first,fourth)])
else:
continue
output.close()
This works to an extent. 这在一定程度上起作用。 However, when viewing in Excel, both items are placed in a single cell rather than two adjacent ones.
但是,在Excel中查看时,两个项目都放在一个单元格中,而不是两个相邻的单元格中。 If I open it in notepad I can see that each line is wrapped in quotation marks.
如果我在记事本中打开它,我可以看到每一行都用引号括起来。 If these are removed Excel will display the columns properly.
如果删除这些,Excel将正确显示列。
Example Input 示例输入
1.2.3.4,rubbish1,rubbish2,reallyimportantdata
Desired Output 期望的输出
1.2.3.4 reallyimportantdata - two separate columns
Actual Output 实际产出
"1.2.3.4,reallyimportantdata" - single column
The Question 问题
Is there any way to fudge the format
part to not write out with quotations? 有没有办法捏造
format
部分不写出引用? Alternatively, what would be the best way to achieve what I'm trying to do? 或者,什么是实现我想要做的最好的方法?
I've tried writing out to another file and stripping the lines but, despite not throwing any errors, the result was the same... 我已经尝试写出另一个文件并剥离行,但是,尽管没有抛出任何错误,结果是相同的......
writerow()
takes a list of elements and writes each of those into a column. writerow()
获取元素列表并将每个元素写入一列。 Since you are feeding a list with only one element, it is being placed into one column. 由于您只为一个列表提供一个元素,因此它将被放入一列中。
Instead, feed writerow()
a list: 相反,为
writerow()
一个列表:
processedcontent.writerow([first,fourth])
Have you considered using Pandas? 你考虑过使用熊猫吗?
import pandas as pd
df = pd.read_csv("myFile.csv", header=0, low_memory=False, index_col=None)
fid = open("outputp.csv","w")
for index, row in df.iterrows():
aa=re.match(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$",row['IP'])
if aa:
tline = '{0},{1}'.format(row['IP'], row['fourth column'])
fid.write(tline)
output.close()
There may be an error or two and I got the regex from here . 可能有一两个错误,我从这里得到了正则表达式。 This assumes the first row of the csv has titles which can be referenced.
这假设csv的第一行具有可以引用的标题。 If it does not then you can use
header = None
and reference the columns with iloc
如果没有,则可以使用
header = None
并使用iloc
引用列
Come to think of it you could probably run the regex on the dataFrame, copy the first and fourth column to a new dataFrame and use the to_csv
method in pandas. 想想你可以在dataFrame上运行正则表达式,将第一列和第四列复制到新的dataFrame并在pandas中使用
to_csv
方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.