[英]Python: How do i split the file?
I have this txt file which is ls -R of etc directory in a linux system. 我有这个txt文件,它是Linux系统中etc目录的ls -R。 Example file:
示例文件:
etc:
ArchiveSEL
xinetd.d
etc/cmm:
CMM_5085.bin
cmm_sel
storage.cfg
etc/crontabs:
root
etc/pam.d:
ftp
rsh
etc/rc.d:
eth.set.sh
rc.sysinit
etc/rc.d/init.d:
cmm
functions
userScripts
etc/security:
access.conf
console.apps
time.conf
etc/security/console.apps:
kbdrate
etc/ssh:
ssh_host_dsa_key
sshd_config
etc/var:
setUser
snmpd.conf
etc/xinetd.d:
irsh
wu-ftpd
I would like to split it by subdirectories into several files. 我想按子目录将其拆分为几个文件。 example files would be like this: etc.txt, etcCmm.txt, etcCrontabs.txt, etcPamd.txt, ...
示例文件将如下所示:etc.txt,etcCmm.txt,etcCrontabs.txt,etcPamd.txt,...
Can someone give me a python code that can do that? 有人可以给我一个可以做到这一点的python代码吗? Notice that the subdirectory lines end with ':', but i'm just not smart enough to write the code.
请注意,子目录行以':'结尾,但是我不够聪明,无法编写代码。 some examples would be appreciated.
一些例子将不胜感激。 thank you :)
谢谢 :)
Maybe something like this? 也许是这样的吗?
re.M
generates a multiline regular expression which can match several lines, and the last part just iterates over the matches and creates the files... re.M
生成可以匹配多行的多行正则表达式,最后一部分只是遍历匹配项并创建文件...
import re
data = '<your input data as above>' # or open('data.txt').read()
results = map(lambda m: (m[0], m[1].strip().splitlines()),
re.findall('^([^\n]+):\n((?:[^\n]+\n)*)\n', data, re.M))
for dirname, files in results:
f = open(dirname.replace('/', '')+'.txt', 'w')
for line in files:
f.write(line + '\n')
f.close()
You will need to do it line-by-line. 您将需要逐行进行操作。 if a
line.endswith(":")
then you are in a new subdirectory. 如果是
line.endswith(":")
则您位于新的子目录中。 From then on, each line is a new entry into your subdirectory, until another line ends with :
. 从那时起,每一行都是您子目录中的新条目,直到另一行以
:
结尾。
From my understanding, you just want to split one textfile into several, ambiguously named, text files. 根据我的理解,您只想将一个文本文件拆分为多个名称不明确的文本文件。
So you'd see if a line ends with :
. 因此,您将看到一行是否以
:
结尾。 then you open a new text file, like etcCmm.txt
, and every line that you read from the source text, from that point on, you write into etcCmm.txt
. 然后打开一个新的文本文件,例如
etcCmm.txt
,然后从源文本读取的每一行都将写入etcCmm.txt
。 When you encounter another line that ends in :
, you close the previously opened file, create a new one, and continue. 当遇到另一行以
:
结尾时,您将关闭先前打开的文件,创建一个新文件,然后继续。
I'm leaving a few things for you to do yourself, such as figuring out what to call the text file, reading a file line-by-line, etc. 我要为您自己做一些事情,例如弄清楚如何调用文本文件,逐行读取文件等。
use regexp like '.*:'. 使用正则表达式,例如“。*:”。
use file.readline(). 使用file.readline()。
use loops. 使用循环。
如果不是必须使用Python,则可以使用此衬板
awk '/:$/{gsub(/:|\//,"");fn=$0}{print $0 > fn".txt"}' file
Here's what I would do: 这就是我要做的:
Read the file into memory ( myfile = open(filename).read()
should do). 将文件读入内存(
myfile = open(filename).read()
应该这样做)。
Then split the file along the delimiters: 然后沿着定界符分割文件:
import re
myregex = re.compile(r"^(.*):[ \t]*$", re.MULTILINE)
arr = myregex.split(myfile)[1:] # dropping everything before the first directory entry
Then convert the array to a dict, removing unwanted characters along the way: 然后将数组转换为dict,删除整个过程中不需要的字符:
mydict = dict([(re.sub(r"\W+","",k), v.strip()) for (k,v) in zip(arr[::2], arr[1::2])])
Then write the files: 然后编写文件:
for name,content in mydict.iteritems():
output = open(name+".txt","w")
output.write(content)
output.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.