[英]Needs to find line pattern in another file and get the sum of corresponding value in particular column using python
I have 2 files where file 1 has the below lines and file 2 has the following lines with some million records.我有 2 个文件,其中文件 1 具有以下行,文件 2 具有以下行,其中包含数百万条记录。 Now I want to search file 1 entries in file 2 and generate the report with sum of 2nd column and the corresponding line next to each other in new file.
现在我想在文件 2 中搜索文件 1 条目,并在新文件中生成具有第 2 列和相应行的总和的报告。
File 1 entries:文件 1 条目:
/dataset1
/dataset2
File 2 entries:文件 2 条目:
12 5 /opt/dataset1
6 0 /opt/dataset2
5 8 /dataset1
Looking for sum of 2nd column values with pattern next to each other寻找模式相邻的第二列值的总和
13 /dataset1
0 /datase2
thank you CS谢谢CS
I would first process File 1 and create a regex with the following format:我将首先处理文件 1 并创建一个具有以下格式的正则表达式:
\d\s+(\d)\s+\S*(\/dataset1|\/dataset2)
After creating the regex, just use re.findall
to find all the relevant information, and sum all the matches.创建正则表达式后,只需使用
re.findall
查找所有相关信息,并对所有匹配项求和。 It should be easy...应该很容易...
Of course, the regex doesn't have a fixed format, you would need to generate it according to the lines of the first file.当然,正则表达式没有固定的格式,你需要根据第一个文件的行来生成它。 Something like that:
像这样的东西:
def generate_regex(file1_lines):
regex = "\d\s+(\d)\s+\S*("
for line in file1_lines:
line = line.replace(r"/", r"\/")
regex += line.strip() + "|"
regex = regex[:-1] + ")"
return regex
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.