[英]sort lines in text file based on certain characters with python
I am needing to sort lines in a text file by the W0** part of the line first then by the N** Part of the line (See example of my text file)我需要先按行的 W0** 部分对文本文件中的行进行排序,然后再按行的 N** 部分对行进行排序(请参阅我的文本文件示例)
**What I have**
oDS_SPOT6_202206261442155_FR1_FR1_SV1_SV1_W065N46_03820_PS8.pix
oDS_SPOT6_202207081449521_FR1_FR1_SV1_SV1_W062N46_03251_PS8.pix
oDS_SPOT6_202207081450141_FR1_FR1_SV1_SV1_W062N45_01790_PS8.pix
oDS_SPOT6_202207081450305_FR1_FR1_SV1_SV1_W063N45_01871_PS8.pix
oDS_SPOT6_202207201458204_FR1_FR1_SV1_SV1_W065N44_04307_PS8.pix
oDS_SPOT6_202207241426291_LM1_LM1_FR1_FR1_W060N47_03170_PS8.pix
**What I want**
oDS_SPOT6_202207241426291_LM1_LM1_FR1_FR1_W060N47_03170_PS8.pix
oDS_SPOT6_202207081450141_FR1_FR1_SV1_SV1_W062N45_01790_PS8.pix
oDS_SPOT6_202207081449521_FR1_FR1_SV1_SV1_W062N46_03251_PS8.pix
oDS_SPOT6_202207081450305_FR1_FR1_SV1_SV1_W063N45_01871_PS8.pix
oDS_SPOT6_202207201458204_FR1_FR1_SV1_SV1_W065N44_04307_PS8.pix
oDS_SPOT6_202206261442155_FR1_FR1_SV1_SV1_W065N46_03820_PS8.pix
The code that I have so far is able to parse though a folder and place the names of the files in a text file.到目前为止,我拥有的代码能够解析文件夹并将文件名放在文本文件中。 I'm just not sure how to go about sorting the files in the text file it creates.
我只是不确定如何对它创建的文本文件中的文件进行排序。
import os
ifolder = raw_input('Path to the folder: ').strip('"')
otext = raw_input('Path to the folder for output text file: ')
file = open(os.path.join(otext, 'listdir.txt'), 'w')
for myfile in os.listdir(ifolder):
print myfile
file.write(myfile + '\n')
file.close()
You can use regex to find the section you want to use for sorting.您可以使用正则表达式来查找要用于排序的部分。 Also, sorting base on whole section
(W0*N*)
should be the same as partially sorting based on 2 sections W*
and N*
.此外,基于整个部分的排序
(W0*N*)
应该与基于 2 个部分W*
和N*
的部分排序相同。
import re
def sort_(x):
# helper function to sort.
# returns the part to use for sorting.
index = re.search("W\d+N\d+", x).span()
section = x[index[0]: index[1]]
return section
if __name__ == "__main__":
lst = """oDS_SPOT6_202206261442155_FR1_FR1_SV1_SV1_W065N46_03820_PS8.pix
oDS_SPOT6_202207081449521_FR1_FR1_SV1_SV1_W062N46_03251_PS8.pix
oDS_SPOT6_202207081450141_FR1_FR1_SV1_SV1_W062N45_01790_PS8.pix
oDS_SPOT6_202207081450305_FR1_FR1_SV1_SV1_W063N45_01871_PS8.pix
oDS_SPOT6_202207201458204_FR1_FR1_SV1_SV1_W065N44_04307_PS8.pix
oDS_SPOT6_202207201458204_FR1_FR1_SV1_SV1_W065N43_04307_PS8.pix
oDS_SPOT6_202207241426291_LM1_LM1_FR1_FR1_W060N47_03170_PS8.pix""".split()
# pass reference to function to sorting algorithm
print("\n".join(sorted(lst, key=sort_)), "\n\n")
If you dont want to use regex expressions and the file names are guaranteed to be the same format, you can use followings:如果您不想使用正则表达式并且保证文件名格式相同,则可以使用以下内容:
print("\n".join(sorted(lst, key=lambda x: x.split("_")[-3])), "\n\n")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.