简体   繁体   English

CSV文件中的两列被读取为单列。 python 2.7

[英]two columns in csv files getting read as single column. python 2.7

i have 2 CSV files. 我有2个CSV文件。 i want each element in list A to get matched with every element in the list B. list A acts as training set and the list B has error which get fixed after getting matched using edit distance. 我希望列表A中的每个元素都与列表B中的每个元素匹配。列表A充当训练集,列表B具有错误,使用编辑距离进行匹配后,该错误将得到修复。

the problem is there are two columns in B. first column has unique numbers and second column has the string to be fixed. 问题是B中有两列。第一列具有唯一编号,第二列具有固定的字符串。

im getting the output as : 即时通讯输出为:

628227teitARMTEteke : iQIARMTEMAC
628226iQIARMTEMAC 9 : iQIARMTEMAC
628229iQIAConfigCH : iQIAConfigCH
627701iQIAConfigCH : iQIAConfigCH

but i want my output to be: 但我希望我的输出是:

628227 : teitARMTEteke : iQIARMTEMAC
628226 : iQIARMTEMAC 9 : iQIARMTEMAC
628229 : iQIAConfigCH : iQIAConfigCH
627701 : iQIAConfigCH : iQIAConfigCH

CODE

import csv
from nltk.metrics import distance


with open("all_correct_promo.csv","rb") as file1:
    reader1 = csv.reader(file1)
    correctPromoList = [''.join(i) for i in reader1]
   # print correctPromoList
with open("all_extracted_promo3.csv","rb") as file2:
    reader2 = csv.reader(file2)
    extractedPromoList = [''.join(i) for i in reader2]
    #print extractedPromoList

incorrectPromo = {}
count = 0
for extracted in extractedPromoList:
    #print 'Computing %dth promo code...' % count
    incorrectPromo[extracted] =  find_min_edit(extracted,correctPromoList) # get comma separated str of real promo codes nearest to extracted
    count+=1
#print incorrectPromo


for key, value in incorrectPromo.iteritems():
    print key ,':', value

Right now the unique numbers are getting read with the strings which will effect the way the string get corrected. 现在,字符串将读取唯一数字,这将影响字符串的更正方式。 i want the numbers to be displayed with its string but without effecting the way the string is getting matched with the strings in list A. 我希望数字与其字符串一起显示,但不影响字符串与列表A中的字符串匹配的方式。

sample from all_extracted_promo3.csv 来自all_extracted_promo3.csv的样本

628229  iQIABundUPGR
628229  iQIAPortUPGR
628229  iQIAConfigCH
628229  iQIARMTEMAC 9

sample from all_correct_promo.csv 来自all_correct_promo.csv的样本

iQ BundleUPGR
IQ MANAGED
IQ04 BRP
IQ1MOBILSUP
IQ2MOBILSUP
iQBundIeUPGR
iQBundle 1
iQBundle 2

Leaving aside a strange way of getting the data - to say the least - that you use, I'll answer strictly about csv.reader . 抛开您使用的一种奇怪的数据获取方式(至少可以说),我将严格回答csv.reader

For csv.reader to distinguish columns, you need to set up its dialect in accordance with your .csv . 为了使csv.reader能够区分列,您需要根据.csv设置其dialect As its docs say, it accepts all invividual dialect formatting parameters as keyword arguments. 如其文档所述 ,它接受所有非生命方言格式设置参数作为关键字参数。 Here, you're probably interested in delimeter : 在这里,您可能对delimeter感兴趣:

csv.reader(<file>,delimeter=<whatever>)

Judging by the excerpts, your all_extracted_promo3.csv has two spaces for delimiter, and all_correct_promo.csv uses a single space. 从摘录来看,您的all_extracted_promo3.csv有两个空格用于定界符,而all_correct_promo.csv使用一个空格。 csv.Reader only supports single-character delimiters though : csv.Reader仅支持单字符定界符

>>> [i for i in csv.reader(open("all_extracted_promo3.csv","rb"),delimiter=' ')]
[['628229', '', 'iQIABundUPGR'],
 ['628229', '', 'iQIAPortUPGR'],
 ['628229', '', 'iQIAConfigCH'],
 ['628229', '', 'iQIARMTEMAC', '9']]

So you'll have to either get around that (by ignoring the 2nd element), change the software that produces the file - eg to use the standard comma as delimiter - or use some other facility to parse the file. 因此,您将不得不解决该问题(忽略第二个元素),更改生成文件的软件-例如,使用标准逗号作为分隔符-或使用其他某种功能来解析文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM