[英]How to extract lines based on a substring from two separate text files in python?
[英]How to extract common lines from text files based on their associated values?
我有 3 個文本文件:
列表1.txt:
032_M5, 5
035_M9, 5
036_M4, 3
038_M2, 6
041_M1, 6
列表2.txt:
032_M5, 6
035_M9, 6
036_M4, 5
038_M2, 5
041_M1, 6
清單 3.txt:
032_M5, 6
035_M9, 6
036_M4, 4
038_M2, 5
041_M1, 6
其中所有 3 個文本文件中行的第一部分(即字符串)相同,但第二部分(即數字)發生了變化。
我想從中獲取三個 output 文件:
Output1.txt --> 數字對應字符串的所有行都是不同的。 例子:
036_M4 3, 5, 4
Output2.txt --> 數字對應字符串的所有行都是相同的。 例子:
041_M1, 6
Output3.txt --> 至少兩個數字對應一個字符串的所有行都是相同的(也包括 Output2.txt 的結果)。 例子:
032_M5, 6
035_M9, 6
038_M2, 5
041_M1, 6
然后,我需要 Output3.txt 中編號為 1、編號 2、編號 3、編號 4、編號 5 和編號 6 的行數。
這是我嘗試過的。 它給了我錯誤的 output。
from collections import defaultdict
data = defaultdict(list)
for fileName in ["List1.txt","List2.txt", "List3.txt"]:
with open(fileName,'r') as file1:
for line in file1:
col1,value = line.split(",")
data[col1].append(int(value))
with open("Output3.txt","w") as output:
for (col1),values in data.items():
if len(values) < 3: continue
result = max(x for x in values)
output.write(f"{col1}, {result}\n")
這是一種不使用任何 python 模塊的方法,它完全依賴於本機內置 python 函數:
with open("List1.txt", "r") as list1, open("List2.txt", "r") as list2, open("List3.txt", "r") as list3:
# Forming association between keywords and numbers.
data1 = list1.readlines()
totalKeys = [elem.split(',')[0] for elem in data1]
numbers1 = [elem.split(',')[1].strip() for elem in data1]
numbers2 = [elem.split(',')[1].strip() for elem in list2.readlines()]
numbers3 = [elem.split(',')[1].strip() for elem in list3.readlines()]
totalValues = list(zip(numbers1,numbers2,numbers3))
totalDict = dict(zip(totalKeys,totalValues))
#Outputs
output1 = []
output2 = []
output3 = []
for key in totalDict.keys():
#Output1
if len(set(totalDict[key])) == 3:
output1.append([key, totalDict[key]])
#Output2
if len(set(totalDict[key])) == 1:
output2.append([key, totalDict[key][0]])
#Output3
if len(set(totalDict[key])) <= 2:
output3.append([key, max(totalDict[key], key=lambda elem: totalDict[key].count(elem))])
#Output1
print('Output1:')
for elem in output1:
print(elem[0] + ' ' + ", ".join(elem[1]))
print()
#Output2
print('Output2:')
for elem in output2:
print(elem[0] + ' ' + " ".join(elem[1]))
print()
#Output3
print('Output3:')
for elem in output3:
print(elem[0] + ' ' + " ".join(elem[1]))
上述結果將是:
Output1:
036_M4 3, 5, 4
Output2:
041_M1 6
Output3:
032_M5 6
035_M9 6
038_M2 5
041_M1 6
max
給出列表中最大的數字,而不是最常出現的數字。 為此,請使用statistics.mode
from collections import defaultdict
from statistics import mode
data = defaultdict(list)
for fileName in ["List1.txt","List2.txt", "List3.txt"]:
with open(fileName,'r') as file1:
for line in file1:
col1,value = line.split(",")
data[col1].append(int(value))
with open("Output1.txt","w") as output:
for (col1),values in data.items():
if len(values) < 3: continue
if values[0] != values[1] != values[2] and values[0] != values[2]:
output.write(f"{col1}, {values[0]}, {values[1]}, {values[2]}\n")
with open("Output2.txt","w") as output:
for (col1),values in data.items():
if len(values) < 3: continue
if values[0] == values[1] == values[2]:
output.write(f"{col1}, {values[0]}\n")
with open("Output3.txt","w") as output:
for (col1),values in data.items():
if len(values) < 3: continue
if len(set(values)) >= 2:
output.write(f"{col1}, {mode(values)}\n")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.