简体   繁体   English

使用 python 根据某些字符对文本文件中的行进行排序

[英]sort lines in text file based on certain characters with python

I am needing to sort lines in a text file by the W0** part of the line first then by the N** Part of the line (See example of my text file)我需要先按行的 W0** 部分对文本文件中的行进行排序,然后再按行的 N** 部分对行进行排序(请参阅我的文本文件示例)

**What I have**
oDS_SPOT6_202206261442155_FR1_FR1_SV1_SV1_W065N46_03820_PS8.pix
oDS_SPOT6_202207081449521_FR1_FR1_SV1_SV1_W062N46_03251_PS8.pix
oDS_SPOT6_202207081450141_FR1_FR1_SV1_SV1_W062N45_01790_PS8.pix
oDS_SPOT6_202207081450305_FR1_FR1_SV1_SV1_W063N45_01871_PS8.pix
oDS_SPOT6_202207201458204_FR1_FR1_SV1_SV1_W065N44_04307_PS8.pix
oDS_SPOT6_202207241426291_LM1_LM1_FR1_FR1_W060N47_03170_PS8.pix

**What I want** 
oDS_SPOT6_202207241426291_LM1_LM1_FR1_FR1_W060N47_03170_PS8.pix
oDS_SPOT6_202207081450141_FR1_FR1_SV1_SV1_W062N45_01790_PS8.pix
oDS_SPOT6_202207081449521_FR1_FR1_SV1_SV1_W062N46_03251_PS8.pix
oDS_SPOT6_202207081450305_FR1_FR1_SV1_SV1_W063N45_01871_PS8.pix
oDS_SPOT6_202207201458204_FR1_FR1_SV1_SV1_W065N44_04307_PS8.pix 
oDS_SPOT6_202206261442155_FR1_FR1_SV1_SV1_W065N46_03820_PS8.pix

The code that I have so far is able to parse though a folder and place the names of the files in a text file.到目前为止,我拥有的代码能够解析文件夹并将文件名放在文本文件中。 I'm just not sure how to go about sorting the files in the text file it creates.我只是不确定如何对它创建的文本文件中的文件进行排序。

import os

ifolder = raw_input('Path to the folder: ').strip('"')
otext = raw_input('Path to the folder for output text file: ')
file = open(os.path.join(otext, 'listdir.txt'), 'w')

for myfile in os.listdir(ifolder):
    print myfile
    file.write(myfile + '\n')
file.close()

You can use regex to find the section you want to use for sorting.您可以使用正则表达式来查找要用于排序的部分。 Also, sorting base on whole section (W0*N*) should be the same as partially sorting based on 2 sections W* and N* .此外,基于整个部分的排序(W0*N*)应该与基于 2 个部分W*N*的部分排序相同。

import re 

def sort_(x):
  # helper function to sort. 
  # returns the part to use for sorting.
  index = re.search("W\d+N\d+", x).span()
  section = x[index[0]: index[1]]
  return section


if __name__ == "__main__":
  lst = """oDS_SPOT6_202206261442155_FR1_FR1_SV1_SV1_W065N46_03820_PS8.pix
        oDS_SPOT6_202207081449521_FR1_FR1_SV1_SV1_W062N46_03251_PS8.pix
        oDS_SPOT6_202207081450141_FR1_FR1_SV1_SV1_W062N45_01790_PS8.pix
        oDS_SPOT6_202207081450305_FR1_FR1_SV1_SV1_W063N45_01871_PS8.pix
        oDS_SPOT6_202207201458204_FR1_FR1_SV1_SV1_W065N44_04307_PS8.pix
        oDS_SPOT6_202207201458204_FR1_FR1_SV1_SV1_W065N43_04307_PS8.pix
        oDS_SPOT6_202207241426291_LM1_LM1_FR1_FR1_W060N47_03170_PS8.pix""".split()
  
  # pass reference to function to sorting algorithm 
  print("\n".join(sorted(lst, key=sort_)), "\n\n")

If you dont want to use regex expressions and the file names are guaranteed to be the same format, you can use followings:如果您不想使用正则表达式并且保证文件名格式相同,则可以使用以下内容:

print("\n".join(sorted(lst, key=lambda x: x.split("_")[-3])), "\n\n")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM