Question is really wider than the title allows me to specify. I have a big file representing unordered numbered packets in the order they were received and the timestamp that corresponds to it, such as (arrows included for clarity, not really in file):
seq_1 ----> timestamp
seq_2 ----> timestamp
seq_3 ----> timestamp
seq_2 ----> timestamp
seq_5 ----> timestamp
seq_4 ----> timestamp
...
Timestamps always increase, but I might duplicate packets, packets out of order, etc. I have parsed the file to a list of strings, and must now decide the appropriate data structure to save it, taking into account that I need to:
The idea is that I could plot (not really going to do it, though) a graph bar, x axis being the sequence numbers and y axis being the timestamp. I need to manually find local maxima and minima, so I should be able to access the adjacent entries of any entry.
I have thought of parsing the list of lines to a dictionary
of (sequence_number, timestamp)
, carefully not overwriting existing entries (condition 1), then turning it into a list
of tuple
s and finally sorting the list
by key . list
should allow me to access adjacent entries, thus fulfilling condition 2. The parsed file is quite big, so I was wondering if there is a solution which would scale better (not requiring conversion between two data structures + posterior sorting).
The easiest bet is to just dump things into a dictionary and sort the keys at the end. The d.get
call ensures that it keeps the first encountered value if one exists, or inserts a new value if it doesn't.
In [23]: s = """seq_1 ----> timestamp1
....: seq_2 ----> timestamp2
....: seq_3 ----> timestamp3
....: seq_2 ----> timestamp4
....: seq_5 ----> timestamp5
....: seq_4 ----> timestamp6
....: seq_9 ----> timestamp7
....: seq_10 ----> timestamp8
....: seq_6 ----> timestamp9
....: seq_7 ----> timestamp10
....: seq_2 ----> timestamp11
....: seq_4 ----> timestamp12"""
In [24]: d = {}
In [25]: for line in s.split("\n"):
seq, ts = map(str.strip, line.split("---->"))
d[seq] = d.get(seq, ts)
....:
In [26]: sorted(d.items(), key=lambda x: int(x[0][4:]))
Out[26]:
[('seq_1', 'timestamp1'),
('seq_2', 'timestamp2'),
('seq_3', 'timestamp3'),
('seq_4', 'timestamp6'),
('seq_5', 'timestamp5'),
('seq_6', 'timestamp9'),
('seq_7', 'timestamp10'),
('seq_9', 'timestamp7'),
('seq_10', 'timestamp8')]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.