简体   繁体   English

按时间戳使用python对文本文件行进行排序

[英]Sort text file lines using python by timestamp

I have a txt file where line 1-5 are all words and line 6 and above has timestamp at the beginning as shown: 我有一个txt文件,其中第1-5行是所有单词,第6行及以上的行在开头有timestamp ,如下所示:

This is a document1
This is a document2
This is a document3
This is a document4
This is a document5
2019-05-27 07:00:00, value1, value2, value3
2019-05-27 06:38:00, value1, value2, value3
2019-05-27 07:05:00, value1, value2, value3

How can I sort lines 6 to the last line where the earliest time is on top and latest time at below? 如何将第6行排序到最早时间位于最前面且最后时间位于下方的最后一行?

This is what I have attempted based on another stack overflow question but did not work. 这是我根据另一个堆栈溢出问题尝试但没有工作。

  lines = sorted(open(outputFile.txt).readlines(), key=lambda line: line[5:-1].split(",")[0])
  outFile.close()

If you don't "need" a one-liner, you can do the following: 如果您不“需要”单行,您可以执行以下操作:

# Read all lines
with open("file.txt") as f:
    lines = f.readlines()

# Keep only from 6th line
lines = lines[5:]
# Sort based on the date of each line
lines.sort(key = lambda l : l.split(',')[0])

Untested, but should work. 未经测试,但应该工作。

You can read the file as a pandas DataFrame and then use sort_values() on the according lines. 您可以将文件作为pandas DataFrame读取 ,然后在相应的行上使用sort_values()

Also, I'd recommend to cast the columns to their type and transfer the table into a tidy format -> here the first column should only be of datetime 此外,我建议将列转换为其类型并将表格转换为整齐的格式 - >此处第一列应仅为datetime

With this approach you'd basically have two lines (w/o casting): 使用这种方法,你基本上有两条线(没有铸造):

df = read_csv('name_of_file.txt', sep='\t', skiprows=5, header=None, names=['first_col'])
df.sort_values('first_col', ascending=True)

Here (in1.txt is the data from the post) 这里(in1.txt是帖子中的数据)

from datetime import datetime

with open('in1.txt') as f:
    sorted_lines = sorted([l.strip() for l in f.readlines()][5:],
                          key=lambda line: datetime.strptime(line.split(",")[0], "%Y-%m-%d %H:%M:%S"))
    for line in sorted_lines:
        print(line)

output 产量

2019-05-27 06:38:00, value1, value2, value3
2019-05-27 07:00:00, value1, value2, value3
2019-05-27 07:05:00, value1, value2, value3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM