Python: Find strings in two file and print all the line

Question

i need to find an occurrency from a file in an other.

My files are like this:

FILE1: CLUSTER_NAME

FILE2: TIMESTAMP, CLUSTER_NAME, LOG

What i want is to check if CLUSTERS that are in the first file, are even in the second file and print all the line.

For example:

FILE1:

clusterA,
clusterB,
clusterC,

FILE2:

2019, clusterB, log
2020, clusterC, log
2017, clusterZ, log

Output should be like this

Input: clusterB, clusterZ
output: 2017, clusterZ, log
        2019, clusterB, log

import pandas as pd

#ARRAY
my_value = []
cluster_value = []

#READ THE FILES
my_data_file = pd.read_csv('my_data.txt', sep=',')
log_file = pd.read_csv('log.txt', sep=',')

#TAKE THE COLUMN WITH THE CLUSTERS
for row in my_data_file[my_data_file.columns[1]]:
    my_value.append(row)

for row in log_file[log_file.columns[0]]:
    cluster_value.append(row)

#Restult
print("_______________")
print(list(set(my_value) & set(cluster_value)))
print("_______________")

It works, but i need to print all the log. I don't kwon how to link the result of my operation to print what i need.

Answer 1

Using regular expressions

Don't need Pandas for this simple file read.

Code

import re

def search(key_file, search_file):
    with open(key_file) as kfile:
      keys = '|'.join(line.rstrip().split(',')[0] for line in kfile.readlines())
    # regex for cluster names
    regex = re.compile(keys)

    with open(search_file) as search_data:
      for line in search_data:
        if regex.search(line):
          print(line.rstrip())

search('mydata.txt', 'log.txt')

Input

'mydata.txt' (note ',' doesn't matter ie ignored)

clusterB,
clusterZ

'log.txt'

2019, clusterB, log
2020, clusterC, log
2017, clusterZ, log

Output

2019, clusterB, log
2017, clusterZ, log

Python: Find strings in two file and print all the line

Question

1 answers

solution1
0 ACCPTED 2020-05-23 12:14:36

Python: Find strings in two file and print all the line

Question

1 answers

solution1 0 ACCPTED 2020-05-23 12:14:36

solution1
0 ACCPTED 2020-05-23 12:14:36