[英]two columns in csv files getting read as single column. python 2.7
i have 2 CSV files. 我有2个CSV文件。 i want each element in list A to get matched with every element in the list B. list A acts as training set and the list B has error which get fixed after getting matched using edit distance.
我希望列表A中的每个元素都与列表B中的每个元素匹配。列表A充当训练集,列表B具有错误,使用编辑距离进行匹配后,该错误将得到修复。
the problem is there are two columns in B. first column has unique numbers and second column has the string to be fixed. 问题是B中有两列。第一列具有唯一编号,第二列具有固定的字符串。
im getting the output as : 即时通讯输出为:
628227teitARMTEteke : iQIARMTEMAC
628226iQIARMTEMAC 9 : iQIARMTEMAC
628229iQIAConfigCH : iQIAConfigCH
627701iQIAConfigCH : iQIAConfigCH
but i want my output to be: 但我希望我的输出是:
628227 : teitARMTEteke : iQIARMTEMAC
628226 : iQIARMTEMAC 9 : iQIARMTEMAC
628229 : iQIAConfigCH : iQIAConfigCH
627701 : iQIAConfigCH : iQIAConfigCH
CODE 码
import csv
from nltk.metrics import distance
with open("all_correct_promo.csv","rb") as file1:
reader1 = csv.reader(file1)
correctPromoList = [''.join(i) for i in reader1]
# print correctPromoList
with open("all_extracted_promo3.csv","rb") as file2:
reader2 = csv.reader(file2)
extractedPromoList = [''.join(i) for i in reader2]
#print extractedPromoList
incorrectPromo = {}
count = 0
for extracted in extractedPromoList:
#print 'Computing %dth promo code...' % count
incorrectPromo[extracted] = find_min_edit(extracted,correctPromoList) # get comma separated str of real promo codes nearest to extracted
count+=1
#print incorrectPromo
for key, value in incorrectPromo.iteritems():
print key ,':', value
Right now the unique numbers are getting read with the strings which will effect the way the string get corrected. 现在,字符串将读取唯一数字,这将影响字符串的更正方式。 i want the numbers to be displayed with its string but without effecting the way the string is getting matched with the strings in list A.
我希望数字与其字符串一起显示,但不影响字符串与列表A中的字符串匹配的方式。
sample from all_extracted_promo3.csv 来自all_extracted_promo3.csv的样本
628229 iQIABundUPGR
628229 iQIAPortUPGR
628229 iQIAConfigCH
628229 iQIARMTEMAC 9
sample from all_correct_promo.csv 来自all_correct_promo.csv的样本
iQ BundleUPGR
IQ MANAGED
IQ04 BRP
IQ1MOBILSUP
IQ2MOBILSUP
iQBundIeUPGR
iQBundle 1
iQBundle 2
Leaving aside a strange way of getting the data - to say the least - that you use, I'll answer strictly about csv.reader
. 抛开您使用的一种奇怪的数据获取方式(至少可以说),我将严格回答
csv.reader
。
For csv.reader
to distinguish columns, you need to set up its dialect
in accordance with your .csv
. 为了使
csv.reader
能够区分列,您需要根据.csv
设置其dialect
。 As its docs say, it accepts all invividual dialect formatting parameters as keyword arguments. 如其文档所述 ,它接受所有非生命方言格式设置参数作为关键字参数。 Here, you're probably interested in
delimeter
: 在这里,您可能对
delimeter
感兴趣:
csv.reader(<file>,delimeter=<whatever>)
Judging by the excerpts, your all_extracted_promo3.csv
has two spaces for delimiter, and all_correct_promo.csv
uses a single space. 从摘录来看,您的
all_extracted_promo3.csv
有两个空格用于定界符,而all_correct_promo.csv
使用一个空格。 csv.Reader
only supports single-character delimiters though : csv.Reader
仅支持单字符定界符 :
>>> [i for i in csv.reader(open("all_extracted_promo3.csv","rb"),delimiter=' ')]
[['628229', '', 'iQIABundUPGR'],
['628229', '', 'iQIAPortUPGR'],
['628229', '', 'iQIAConfigCH'],
['628229', '', 'iQIARMTEMAC', '9']]
So you'll have to either get around that (by ignoring the 2nd element), change the software that produces the file - eg to use the standard comma as delimiter - or use some other facility to parse the file. 因此,您将不得不解决该问题(忽略第二个元素),更改生成文件的软件-例如,使用标准逗号作为分隔符-或使用其他某种功能来解析文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.