简体   繁体   中英

Python: Find strings in two file and print all the line

i need to find an occurrency from a file in an other.

My files are like this:

FILE1: CLUSTER_NAME

FILE2: TIMESTAMP, CLUSTER_NAME, LOG

What i want is to check if CLUSTERS that are in the first file, are even in the second file and print all the line.

For example:

FILE1:

  • clusterA,
  • clusterB,
  • clusterC,

FILE2:

  • 2019, clusterB, log
  • 2020, clusterC, log
  • 2017, clusterZ, log

Output should be like this

Input: clusterB, clusterZ
output: 2017, clusterZ, log
        2019, clusterB, log
import pandas as pd

#ARRAY
my_value = []
cluster_value = []

#READ THE FILES
my_data_file = pd.read_csv('my_data.txt', sep=',')
log_file = pd.read_csv('log.txt', sep=',')

#TAKE THE COLUMN WITH THE CLUSTERS
for row in my_data_file[my_data_file.columns[1]]:
    my_value.append(row)

for row in log_file[log_file.columns[0]]:
    cluster_value.append(row)

#Restult
print("_______________")
print(list(set(my_value) & set(cluster_value)))
print("_______________")

It works, but i need to print all the log. I don't kwon how to link the result of my operation to print what i need.

Using regular expressions

  • Don't need Pandas for this simple file read.

Code

import re

def search(key_file, search_file):
    with open(key_file) as kfile:
      keys = '|'.join(line.rstrip().split(',')[0] for line in kfile.readlines())
    # regex for cluster names
    regex = re.compile(keys)

    with open(search_file) as search_data:
      for line in search_data:
        if regex.search(line):
          print(line.rstrip())

search('mydata.txt', 'log.txt')

Input

'mydata.txt' (note ',' doesn't matter ie ignored)

clusterB,
clusterZ

'log.txt'

2019, clusterB, log
2020, clusterC, log
2017, clusterZ, log

Output

2019, clusterB, log
2017, clusterZ, log

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM