简体   繁体   English

在python中减去匹配的行

[英]Subtract matching rows in python

I got two files each containing a column with "time" and one with "id" like this: 我得到了两个文件,每个文件都包含带有“时间”的列和一个带有“ id”的列,如下所示:

File 1: 文件1:

time     id
11.24    1
11.26    2
11.27    3
11.29    5
11.30    6

File 2: 档案2:

time     id
11.25    1
11.26    3
11.27    4
11.31    6
11.32    7
11.33    8

Im trying to do a python script which can subtract the time of the rows with matching id from each other. 我正在尝试做一个python脚本,可以互相减去匹配ID的行的时间。 The files are of different length. 文件长度不同。

I tried using set(id's of file 1) & set(id's of file 2) to get the matching id, but now I'm stuck. 我尝试使用set(id's of file 1) & set(id's of file 2)获取匹配的id,但是现在我被卡住了。 Any help will be much appreciated, thank you. 任何帮助将不胜感激,谢谢。

List comprehensions can do the trick very easily: 列表理解可以很容易地达到目的:

#read these from file if you want to, included in this form for brevity
F1 = {1: 11.24, 2: 11.26, 3:11.27, 5:11.29, 6:11.30}
F2 = {1:11.25, 3:11.26, 4:11.27, 6:11.31, 7:11.32, 8:11.33}

K1 = set(F1.keys())
K2 = set(F2.keys())

result = dict([ (k, F1[k] - F2[k]) for k in (K1 & K2)])
print result

This will output: 这将输出:

{1: -0.009999999999999787, 3: 0.009999999999999787, 6: -0.009999999999999787}

Edit: As mhawke points out, the last line could read: 编辑:正如mhawke指出的那样,最后一行可能显示为:

result = {k: F1[k] - F2[k]) for k in (K1 & K2)}

I had forgotten all about dict comprehensions. 我已经忘记了所有关于字典理解的知识。

Python Set do not support ordering for the elements. Python Set不支持元素的排序。 I would store the data as a dictionary 我会将数据存储为字典

file1 = {1:'11:24', 2:'11:26', ... etc}
file2 = {1:'11:25', 3:'11:26', ... etc}

The loop over the intersection of the keys (or union based on your needs) to do the subtraction (time based or math based). 在键的交集(或根据您的需要的并集)上循环以进行减法(基于时间或基于数学)。

This is a bit old school. 这有点老派。 Look at using a default dict from the collections module for a more elegant approach. 查看使用collections模块中的默认dict以获得更优雅的方法。

This will work for any number of files, I've named mine f1 , f2 etc. The general idea is to process each file and build up a list of time values for each id. 这将适用于任何数量的文件,我将其命名为mine f1f2等。一般的想法是处理每个文件并为每个id建立一个时间值列表。 After file processing, iterate over the dictionary subtracting each value as you go (via reduce on the values list). 在文件处理之后,遍历字典,同时减去每个值(通过在值列表上reduce )。

from operator import sub

d = {}
for fname in ('f1','f2'):
    for l in open(fname):
        t, i = l.split()
        d[i] = d.get(i, []) + [float(t)]

results = {}
for k,v in d.items():
    results[k] = reduce(sub, v)

print results
{'1': -0.009999999999999787, '3': 0.009999999999999787, '2': 11.26, '5': 11.29, '4': 11.27, '7': 11.32, '6': -0.009999999999999787, '8': 11.33}

Updated 更新

If you want to include only those ids with more than one value: 如果您只想包含多个值的ID:

results = {}
for k,v in d.items():
    if len(v) > 1:
        results[k] = reduce(sub, v)

You can use this as a base (instead of treating '11.24' as a float, I guess you want to adapt for hours/minutes or minutes/seconds)... you can effectively union and subtract matching keys using a defaultdict . 您可以以此为基础(而不是将'11 .24'视为浮点数,我想您想适应小时/分钟或分钟/秒)...您可以使用defaultdict有效地合并和减去匹配键。

As long as you can get your data into a format like this: 只要您可以将数据转换成如下格式:

f1 = [
    [11.24, 1],
    [11.26, 2],
    [11.27, 3],
    [11.29, 5],
    [11.30, 6]
]

f2 = [
    [11.25, 1],
    [11.26, 3],
    [11.27, 4],
    [11.31, 6],
    [11.32, 7],
    [11.33, 8]
]

Then: 然后:

from collections import defaultdict
from itertools import chain

dd = defaultdict(float)
for k, v in chain(
    ((b, a) for a, b in f1),
    ((b, -a) for a, b in f2)): # negate a

    dd[k] += v

Results in: 结果是:

{1: -0.009999999999999787,
 2: 11.26,
 3: 0.009999999999999787,
 4: -11.27,
 5: 11.29,
 6: -0.009999999999999787,
 7: -11.32,
 8: -11.33}

For matches only 仅适用于比赛

matches = dict( (k, v) for v, k in f1 )
d2 = dict( (k, v) for v, k in f2 )

for k, v in matches.items():
    try:
        matches[k] = v - d2[k]
    except KeyError as e:
        del matches[k]

print matches
# {1: -0.009999999999999787, 3: 0.009999999999999787, 6: -0.009999999999999787}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM