[英]Sort text file lines using python by timestamp
I have a txt file where line 1-5 are all words and line 6 and above has timestamp
at the beginning as shown: 我有一个txt文件,其中第1-5行是所有单词,第6行及以上的行在开头有
timestamp
,如下所示:
This is a document1
This is a document2
This is a document3
This is a document4
This is a document5
2019-05-27 07:00:00, value1, value2, value3
2019-05-27 06:38:00, value1, value2, value3
2019-05-27 07:05:00, value1, value2, value3
How can I sort lines 6 to the last line where the earliest time is on top and latest time at below? 如何将第6行排序到最早时间位于最前面且最后时间位于下方的最后一行?
This is what I have attempted based on another stack overflow question but did not work. 这是我根据另一个堆栈溢出问题尝试但没有工作。
lines = sorted(open(outputFile.txt).readlines(), key=lambda line: line[5:-1].split(",")[0])
outFile.close()
If you don't "need" a one-liner, you can do the following: 如果您不“需要”单行,您可以执行以下操作:
# Read all lines
with open("file.txt") as f:
lines = f.readlines()
# Keep only from 6th line
lines = lines[5:]
# Sort based on the date of each line
lines.sort(key = lambda l : l.split(',')[0])
Untested, but should work. 未经测试,但应该工作。
You can read the file as a pandas DataFrame and then use sort_values() on the according lines. 您可以将文件作为pandas DataFrame读取 ,然后在相应的行上使用sort_values() 。
Also, I'd recommend to cast the columns to their type and transfer the table into a tidy format -> here the first column should only be of datetime 此外,我建议将列转换为其类型并将表格转换为整齐的格式 - >此处第一列应仅为datetime
With this approach you'd basically have two lines (w/o casting): 使用这种方法,你基本上有两条线(没有铸造):
df = read_csv('name_of_file.txt', sep='\t', skiprows=5, header=None, names=['first_col'])
df.sort_values('first_col', ascending=True)
Here (in1.txt is the data from the post) 这里(in1.txt是帖子中的数据)
from datetime import datetime
with open('in1.txt') as f:
sorted_lines = sorted([l.strip() for l in f.readlines()][5:],
key=lambda line: datetime.strptime(line.split(",")[0], "%Y-%m-%d %H:%M:%S"))
for line in sorted_lines:
print(line)
output 产量
2019-05-27 06:38:00, value1, value2, value3
2019-05-27 07:00:00, value1, value2, value3
2019-05-27 07:05:00, value1, value2, value3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.