[英]Compare two columns of a file using python or unix
I have data in a file like csv, or a txt file with a specific seperator. 我有像csv这样的文件中的数据,或者带有特定分隔符的txt文件。 for example: 例如:
date|Symbol
2017-05-01|A
2017-05-01|B
2017-05-01|C
2017-05-01|A
2017-05-02|A
2017-05-02|B
2017-05-02|C
2017-05-03|A
2017-05-04|A
2017-05-04|B
2017-05-04|C
2017-05-05|A
2017-05-05|A
2017-05-05|B
2017-05-06|C
2017-05-06|A
2017-05-07|A
2017-05-05|B
2017-05-07|C
2017-05-08|A
Now I want to check if any symbol is getting repeated on a particular day,and if yes, then the symbol with date. 现在我想检查是否有任何符号在某一天重复,如果是,那么带有日期的符号。 Like Symbol A is getting repeat on 01-May, B is on 05-May. 就像符号A在5月1日重复,B在5月5日。
I am trying to do it by using python, that Putting all Symbols in a list, and then check it over column one if any date is getting repeated. 我试图通过使用python,将所有符号放在列表中,然后在第一列检查,如果任何日期重复。
Is there any other solutions than this. 还有其他解决方案吗?
Read line by line then split by pipe |: 逐行读取然后通过管道拆分|:
ln.split("|")[1]
This will show characters like AB ... 这将显示像AB这样的人物......
Compare this with others 与其他人比较
With python difflib https://pymotw.com/2/difflib/ 使用python difflib https://pymotw.com/2/difflib/
import difflib
from difflib_data import *
d = difflib.Differ()
diff = d.compare(text1_lines, text2_lines)
print '\n'.join(diff)
I have created a list of dictionaries and each dictionary have key as data and list of column 2 as a value. 我创建了一个字典列表,每个字典都有键作为数据,列2的列表作为值。 now i checked in every dictionary if any thing is repeating. 现在我检查了每一本字典是否有任何重复。
If any one have better solution than this, then it is most welcome. 如果任何人有比这更好的解决方案,那么最受欢迎。
Updating implementation code for above: 更新上面的实现代码:
with open(file_path,"rb") as f:
reader = csv.reader(f,delimiter=delmtr)
for line in reader:
if is_header == 1:
is_header = 0
continue
date_dict = {}
inst_fl_col = inst_col - 1
date_fl_col = date_col - 1
if line[date_fl_col] not in date_list:
date_list.append(line[date_fl_col])
instrument_list = []
instrument_list.append(line[inst_fl_col])
date_dict[line[date_fl_col]] = instrument_list
p_list.append(date_dict)
csvwriter.writerow(line)
del date_dict,instrument_list
else:
for dicts in p_list:
for k,v in dicts.items():
if k == line[date_fl_col]:
if line[inst_fl_col] not in v:
v.append(line[inst_fl_col])
csvwriter.writerow(line)
else:
count += 1
nw_fl.close()
print str(count)+" rows ignored in newly created "+new_file_name+" file"
del date_list[:],is_header,csvwriter,count
I did it by using basic knowledge of python, now i'm improving this using collections module and defaultdict class. 我是通过使用python的基本知识来完成的,现在我正在使用collections模块和defaultdict类来改进它。 Please let me know if any one require the improved code. 如果有人要求改进代码,请告诉我。
Suggestion are most welcome. 建议是最受欢迎的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.