Python：如何分割文件？

Question

I have this txt file which is ls -R of etc directory in a linux system. 我有这个txt文件，它是Linux系统中etc目录的ls -R。 Example file: 示例文件：

etc:  
ArchiveSEL  
xinetd.d

etc/cmm:  
CMM_5085.bin  
cmm_sel  
storage.cfg  

etc/crontabs:  
root

etc/pam.d:  
ftp    
rsh  

etc/rc.d:  
eth.set.sh  
rc.sysinit  

etc/rc.d/init.d:  
cmm  
functions  
userScripts  

etc/security:  
access.conf  
console.apps  
time.conf

etc/security/console.apps:  
kbdrate

etc/ssh:  
ssh_host_dsa_key  
sshd_config  

etc/var:  
setUser  
snmpd.conf

etc/xinetd.d:  
irsh  
wu-ftpd

I would like to split it by subdirectories into several files. 我想按子目录将其拆分为几个文件。 example files would be like this: etc.txt, etcCmm.txt, etcCrontabs.txt, etcPamd.txt, ... 示例文件将如下所示：etc.txt，etcCmm.txt，etcCrontabs.txt，etcPamd.txt，...
Can someone give me a python code that can do that? 有人可以给我一个可以做到这一点的python代码吗？ Notice that the subdirectory lines end with ':', but i'm just not smart enough to write the code. 请注意，子目录行以'：'结尾，但是我不够聪明，无法编写代码。 some examples would be appreciated. 一些例子将不胜感激。 thank you :) 谢谢：）

Answer 1

Maybe something like this? 也许是这样的吗？ re.M generates a multiline regular expression which can match several lines, and the last part just iterates over the matches and creates the files... re.M生成可以匹配多行的多行正则表达式，最后一部分只是遍历匹配项并创建文件...

import re

data = '<your input data as above>' # or open('data.txt').read()
results = map(lambda m: (m[0], m[1].strip().splitlines()),
    re.findall('^([^\n]+):\n((?:[^\n]+\n)*)\n', data, re.M))

for dirname, files in results:
    f = open(dirname.replace('/', '')+'.txt', 'w')
    for line in files:
        f.write(line + '\n')
    f.close()

Answer 2

You will need to do it line-by-line. 您将需要逐行进行操作。 if a line.endswith(":") then you are in a new subdirectory. 如果是line.endswith(":")则您位于新的子目录中。 From then on, each line is a new entry into your subdirectory, until another line ends with : . 从那时起，每一行都是您子目录中的新条目，直到另一行以:结尾。

From my understanding, you just want to split one textfile into several, ambiguously named, text files. 根据我的理解，您只想将一个文本文件拆分为多个名称不明确的文本文件。

So you'd see if a line ends with : . 因此，您将看到一行是否以:结尾。 then you open a new text file, like etcCmm.txt , and every line that you read from the source text, from that point on, you write into etcCmm.txt . 然后打开一个新的文本文件，例如etcCmm.txt ，然后从源文本读取的每一行都将写入etcCmm.txt 。 When you encounter another line that ends in : , you close the previously opened file, create a new one, and continue. 当遇到另一行以:结尾时，您将关闭先前打开的文件，创建一个新文件，然后继续。

I'm leaving a few things for you to do yourself, such as figuring out what to call the text file, reading a file line-by-line, etc. 我要为您自己做一些事情，例如弄清楚如何调用文本文件，逐行读取文件等。

Answer 3

use regexp like '.*:'. 使用正则表达式，例如“。*：”。
use file.readline(). 使用file.readline（）。
use loops. 使用循环。

Answer 4

如果不是必须使用Python，则可以使用此衬板

awk '/:$/{gsub(/:|\//,"");fn=$0}{print $0 > fn".txt"}' file

Answer 5

Here's what I would do: 这就是我要做的：

Read the file into memory ( myfile = open(filename).read() should do). 将文件读入内存（ myfile = open(filename).read()应该这样做）。

Then split the file along the delimiters: 然后沿着定界符分割文件：

import re
myregex = re.compile(r"^(.*):[ \t]*$", re.MULTILINE)
arr = myregex.split(myfile)[1:] # dropping everything before the first directory entry

Then convert the array to a dict, removing unwanted characters along the way: 然后将数组转换为dict，删除整个过程中不需要的字符：

mydict = dict([(re.sub(r"\W+","",k), v.strip()) for (k,v) in zip(arr[::2], arr[1::2])])

Then write the files: 然后编写文件：

for name,content in mydict.iteritems():
    output = open(name+".txt","w")
    output.write(content)
    output.close()

Python：如何分割文件？

问题描述

5 个解决方案

解决方案1
2 已采纳 2010-07-19 09:54:38

解决方案2
1 2010-07-19 09:45:12

解决方案3
0 2010-07-19 09:42:42

解决方案4
0 2010-07-19 09:44:46

解决方案5
0 2010-07-19 09:56:35

Python：如何分割文件？

问题描述

5 个解决方案

解决方案1 2 已采纳 2010-07-19 09:54:38

解决方案2 1 2010-07-19 09:45:12

解决方案3 0 2010-07-19 09:42:42

解决方案4 0 2010-07-19 09:44:46

解决方案5 0 2010-07-19 09:56:35

解决方案1
2 已采纳 2010-07-19 09:54:38

解决方案2
1 2010-07-19 09:45:12

解决方案3
0 2010-07-19 09:42:42

解决方案4
0 2010-07-19 09:44:46

解决方案5
0 2010-07-19 09:56:35