简体   繁体   English

根据文件内容重命名文件

[英]Rename Files Based on File Content

Using Python, I'm trying to rename a series of .txt files in a directory according to a specific phrase in each given text file. 使用Python,我试图根据每个给定文本文件中的特定短语在目录中重命名一系列.txt文件。 Put differently and more specifically, I have a few hundred text files with arbitrary names but within each file is a unique phrase (something like No. 85-2156). 换句话说,我有几百个带有任意名称的文本文件,但每个文件中都有一个唯一的短语(类似于No. 85-2156)。 I would like to replace the arbitrary file name with that given phrase for every text file. 我想用每个文本文件的给定短语替换任意文件名。 The phrase is not always on the same line (though it doesn't deviate that much) but it always is in the same format and with the No. prefix. 该词组不一定总是在同一行上(尽管它并没有那么大的偏离),但是它始终是相同的格式,并带有No.前缀。

I've looked at the os module and I understand how 我看过os模块 ,我了解

could be useful but I don't understand how to combine those functions with intratext manipulation functions like linecache or general line reading functions. 可能是有用的,但我不明白如何将这些功能与像intratext处理功能相结合linecache或总路线阅读功能。

I've thought through many ways of accomplishing this task but it seems like easiest and most efficient way would be to create a loop that finds the unique phrase in a file, assigns it to a variable and use that variable to rename the file before moving to the next file. 我已经考虑过完成此任务的许多方法,但似乎最简单,最有效的方法是创建一个循环,该循环在文件中查找唯一短语,将其分配给变量,然后在移动之前使用该变量重命名文件到下一个文件。

This seems like it should be easy, so much so that I feel silly writing this question. 这看起来应该很容易,以至于让我觉得写这个问题很傻。 I've spent the last few hours looking reading documentation and parsing through StackOverflow but it doesn't seem like anyone has quite had this issue before -- or at least they haven't asked about their problem. 我花了最后几个小时来阅读文档并通过StackOverflow进行解析,但是似乎以前没有人遇到过这个问题-至少他们没有问过他们的问题。

Can anyone point me in the right direction? 谁能指出我正确的方向?

EDIT 1: When I create the regex pattern using this website , it creates bulky but seemingly workable code: 编辑1:当我使用此网站创建正则表达式模式时,它会创建庞大但看似可行的代码:

import re

txt='No. 09-1159'

re1='(No)'  # Word 1
re2='(\\.)' # Any Single Character 1
re3='( )'   # White Space 1
re4='(\\d)' # Any Single Digit 1
re5='(\\d)' # Any Single Digit 2
re6='(-)'   # Any Single Character 2
re7='(\\d)' # Any Single Digit 3
re8='(\\d)' # Any Single Digit 4
re9='(\\d)' # Any Single Digit 5
re10='(\\d)'    # Any Single Digit 6

rg = re.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10,re.IGNORECASE|re.DOTALL)
m = rg.search(txt)
name = m.group(0)
print name

When I manipulate that to fit the glob.glob structure, and make it like this: 当我操纵它以适合glob.glob结构,并使它像这样时:

import glob
import os
import re

re1='(No)'  # Word 1
re2='(\\.)' # Any Single Character 1
re3='( )'   # White Space 1
re4='(\\d)' # Any Single Digit 1
re5='(\\d)' # Any Single Digit 2
re6='(-)'   # Any Single Character 2
re7='(\\d)' # Any Single Digit 3
re8='(\\d)' # Any Single Digit 4
re9='(\\d)' # Any Single Digit 5
re10='(\\d)'    # Any Single Digit 6

rg = re.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10,re.IGNORECASE|re.DOTALL)

for fname in glob.glob("\file\structure\here\*.txt"):
    with open(fname) as f:
        contents = f.read()
    tname = rg.search(contents)
    print tname

Then this prints out the byte location of the the pattern -- signifying that the regex pattern is correct. 然后,这会打印出模式的字节位置-表示正则表达式模式正确。 However, when I add in the nname = tname.group(0) line after the original tname = rg.search(contents) and change around the print function to reflect the change, it gives me the following error: AttributeError: 'NoneType' object has no attribute 'group'. 但是,当我在原始tname = rg.search(contents)之后添加nname = tname.group(0)行并在打印函数周围进行更改以反映更改时,它给了我以下错误:AttributeError:'NoneType'对象没有属性“组”。 When I tried copying and pasting @joaquin's code line for line, it came up with the same error. 当我尝试将@joaquin的代码行复制并粘贴到行中时,出现了相同的错误。 I was going to post this as a comment to the @spatz answer but I wanted to include so much code that this seemed to be a better way to express the `new' problem. 我打算将其作为对@spatz答案的评论,但我想包含太多代码,这似乎是表达“新”问题的更好方法。 Thank you all for the help so far. 谢谢大家到目前为止的帮助。

Edit 2: This is for the @joaquin answer below: 编辑2:这是为下面的@joaquin答案:

import glob
import os
import re

for fname in glob.glob("/directory/structure/here/*.txt"):
    with open(fname) as f:
        contents = f.read()
    tname = re.search('No\. (\d\d\-\d\d\d\d)', contents)
    nname = tname.group(1)
    print nname

Last Edit: I got it to work using mostly the code as written. 上次编辑:我主要使用编写的代码来工作。 What was happening is that there were some files that didn't have that regex expression so I assumed Python would skip them. 发生的事情是,有些文件没有该正则表达式,因此我认为Python会跳过它们。 Silly me. 傻我 So I spent three days learning to write two lines of code (I know the lesson is more than that). 因此,我花了三天时间学习编写两行代码(我知道这不仅是一堂课)。 I also used the error catching method recommended here. 我还使用了这里推荐的错误捕获方法。 I wish I could check all of you as the answer, but I bothered @Joaquin the most so I gave it to him. 我希望我能检查所有人作为答案,但我最讨厌@Joaquin,所以给了他。 This was a great learning experience. 这是一次很棒的学习经历。 Thank you all for being so generous with your time. 谢谢大家这么慷慨。 The final code is below. 最终代码如下。

import os
import re

pat3 = "No\. (\d\d-\d\d)"
ext = '.txt'
mydir = '/directory/files/here'


for arch in os.listdir(mydir):
    archpath = os.path.join(mydir, arch)
    with open(archpath) as f:
        txt = f.read()
    s = re.search(pat3, txt)
    if s is None:
        continue    
    name = s.group(1)
    newpath = os.path.join(mydir, name)
    if not os.path.exists(newpath):
        os.rename(archpath, newpath + ext)
    else:
        print '{} already exists, passing'.format(newpath)

Instead of providing you with some code which you will simply copy-paste without understanding, I'd like to walk you through the solution so that you will be able to write it yourself, and more importantly gain enough knowledge to be able to do it alone next time. 我想为您提供解决方案,以便您可以自己编写它,更重要的是获得足够的知识,可以为您提供一些代码,而这些代码只是您在不理解的情况下简单地粘贴粘贴。下次一个人。

The code which does what you need is made up of three main parts: 满足您需求的代码由三个主要部分组成:

  1. Getting a list of all filenames you need to iterate 获取您需要迭代的所有文件名的列表
  2. For each file, extract the information you need to generate a new name for the file 对于每个文件,提取所需的信息以为该文件生成一个新名称
  3. Rename the file from its old name to the new one you just generated 将文件从旧名称重命名为刚生成的新名称

Getting a list of filenames 获取文件名列表

This is best achieved with the glob module. 这是使用glob模块最好地实现的。 This module allows you to specify shell-like wildcards and it will expand them. 该模块允许您指定类似shell的通配符,并将扩展它们。 This means that in order to get a list of .txt file in a given directory, you will need to call the function glob.iglob("/path/to/directory/*.txt") and iterate over its result ( for filename in ...: ). 这意味着为了获取给定目录中的.txt文件列表,您将需要调用函数glob.iglob("/path/to/directory/*.txt")并遍历其结果( for filename in ...:

Generate new name 产生新名字

Once we have our filename, we need to open() it, read its contents using read() and store it in a variable where we can search for what we need. 有了文件名后,我们需要open()open() ,使用read()读取它的内容,并将其存储在变量中,我们可以在其中搜索所需内容。 That would look something like this: 看起来像这样:

with open(filename) as f:
    contents = f.read()

Now that we have the contents, we need to look for the unique phrase. 现在我们有了内容,我们需要寻找唯一的短语。 This can be done using regular expressions . 这可以使用正则表达式来完成。 Store the new filename you want in a variable, say newfilename . 将所需的新文件名存储在变量中,例如newfilename

Rename 改名

Now that we have both the old and the new filenames, we need to simply rename the file, and that is done using os.rename(filename, newfilename) . 现在我们有了新文件名和旧文件名,我们只需要简单地重命名文件,就可以使用os.rename(filename, newfilename)

If you want to move the files to a different directory, use os.rename(filename, os.path.join("/path/to/new/dir", newfilename) . Note that we need os.path.join here to construct the new path for the file using a directory path and newfilename . 如果要将文件移动到其他目录,请使用os.rename(filename, os.path.join("/path/to/new/dir", newfilename) 。请注意,此处需要os.path.join使用目录路径和newfilename构造文件的新路径。

There is no checking or protection for failures (check is archpath is a file, if newpath already exists, if the search is succesful, etc...), but this should work: 没有检查或保护失败(检查archpath是一个文件,如果newpath已经存在,搜索是否成功等),但是这应该起作用:

import os
import re

pat = "No\. (\d\d\-\d\d\d\d)"
mydir = 'mydir'
for arch in os.listdir(mydir):
    archpath = os.path.join(mydir, arch)
    with open(archpath) as f:
        txt = f.read()
    s = re.search(pat, txt)
    name = s.group(1)
    newpath = os.path.join(mydir, name)
    os.rename(archpath, newpath)

Edit: I tested the regex to show how it works: 编辑:我测试了正则表达式以显示其工作原理:

>>> import re
>>> pat = "No\. (\d\d\-\d\d\d\d)"
>>> txt='nothing here or whatever No. 09-1159 you want, does not matter'
>>> s = re.search(pat, txt)
>>> s.group(1)
'09-1159'
>>> 

The regex is very simple: 正则表达式非常简单:

\. -> a dot
\d -> a decimal digit
\- -> a dash

So, it says: search for the string "No. " followed by 2+4 decimal digits separated by a dash. 因此,它说:搜索字符串"No. "后跟2 + 4个由破折号分隔的十进制数字。 The parentheses are to create a group that I can recover with s.group(1) and that contains the code number. 括号用于创建一个组,我可以使用s.group(1)恢复该组,其中包含代码号。

And that is what you get, before and after: 这就是您在前后所得到的:

在此处输入图片说明

Text of files one.txt, two.txt and three.txt is always the same, only the number changes: 文件one.txt,two.txt和three.txt的文本始终相同,只是数字有所变化:

this is the first
file with a number
nothing here or whatever No. 09-1159 you want, does not matter
the number is

Create a backup of your files, then try something like this: 创建文件的备份,然后尝试执行以下操作:

import glob
import os

def your_function_to_dig_out_filename(lines):
  import re
  # i'll let you attempt this yourself

for fn in glob.glob('/path/to/your/dir/*.txt'):
  with open(fn) as f:
    spam = f.readlines()
  new_fn = your_function_to_dig_out_filename(spam)
  if not os.path.exists(new_fn):
    os.rename(fn, new_fn)
  else:
    print '{} already exists, passing'.format(new_fn)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据一个txt文件的内容重命名一个目录下的所有文件 - Rename all files in a directory based on the content of a txt file 如何使用文件内容重命名多个文件 - How to use the file content to rename multiple files 根据 csv 文件值重命名文件 - Python - Rename files based on csv file values - Python 根据其内容重命名列 - Rename a column based on the content of it 重命名 pdf 重命名 pdf 文件及其在特定位置的文本内容 - Rename pdf rename pdf files with their text content in a specific location 使用os重命名基于python中的变量重命名文件 - Rename files based on variables in python using os rename 如何打开文件夹并将文本文件放置在数据框中并根据文件名重命名数据框? - How to open folder and place text files in dataframe and rename dataframe based on file name? 将模板文件复制到多个目录,查询模板文件以根据目录进行重命名 - Copy template file to multiple directories, query template files for rename based on directory 将多个 csv 文件读入单个数据帧并根据原始文件重命名列 - Pandas - Read multiple csv files into a single dataframe and rename columns based on file of origin - Pandas 如何根据另一个文件名文件夹重命名文件夹中的所有文件名? - How to rename all files names in folder based on another file name folder?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM