i need to find an occurrency from a file in an other.
My files are like this:
FILE1: CLUSTER_NAME
FILE2: TIMESTAMP, CLUSTER_NAME, LOG
What i want is to check if CLUSTERS that are in the first file, are even in the second file and print all the line.
For example:
FILE1:
FILE2:
Output should be like this
Input: clusterB, clusterZ
output: 2017, clusterZ, log
2019, clusterB, log
import pandas as pd
#ARRAY
my_value = []
cluster_value = []
#READ THE FILES
my_data_file = pd.read_csv('my_data.txt', sep=',')
log_file = pd.read_csv('log.txt', sep=',')
#TAKE THE COLUMN WITH THE CLUSTERS
for row in my_data_file[my_data_file.columns[1]]:
my_value.append(row)
for row in log_file[log_file.columns[0]]:
cluster_value.append(row)
#Restult
print("_______________")
print(list(set(my_value) & set(cluster_value)))
print("_______________")
It works, but i need to print all the log. I don't kwon how to link the result of my operation to print what i need.
Using regular expressions
Code
import re
def search(key_file, search_file):
with open(key_file) as kfile:
keys = '|'.join(line.rstrip().split(',')[0] for line in kfile.readlines())
# regex for cluster names
regex = re.compile(keys)
with open(search_file) as search_data:
for line in search_data:
if regex.search(line):
print(line.rstrip())
search('mydata.txt', 'log.txt')
Input
'mydata.txt' (note ',' doesn't matter ie ignored)
clusterB,
clusterZ
'log.txt'
2019, clusterB, log
2020, clusterC, log
2017, clusterZ, log
Output
2019, clusterB, log
2017, clusterZ, log
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.