简体   繁体   English

将文本文件分为两个不重叠的文件

[英]Dividing a text file into two files with non-overlapping entries

Let me explain the problem in detail... 让我详细解释这个问题...

i have two text files (pool files) CL-0.txt and CL-1.txt I have to divide each of the two into two further parts. 我有两个文本文件(池文件)CL-0.txt和CL-1.txt,我必须将两个文件分别分成另外两个部分。 CL-0.txt into two further parts xx_0.txt and yy_0.txt and CL-1.txt into parts xx_1.txt and yy_1.txt. CL-0.txt分为另外两个部分xx_0.txt和yy_0.txt,CL-1.txt分为两个部分xx_1.txt和yy_1.txt。 the contents of two files are in the following format. 两个文件的内容采用以下格式。 CL-0 (apple, orange) (mango, banana) (cake, tea) (coffee, sugar) (milk, honey) (cake, biscuts) CL-1 (orange, mango) (grapes, coffee) (car, icecream) (table, chair) (window, milk) CL-0(苹果,橙子)(芒果,香蕉)(蛋糕,茶)(咖啡,糖)(牛奶,蜂蜜)(蛋糕,biscuts)CL-1(橙子,芒果)(葡萄,咖啡)(汽车,冰淇淋) )(桌子,椅子)(窗户,牛奶)

to clear what i am refering to as an entry and as an entity: an entry is: (apple, orange) and an entity is: apple each entry has two elements. 清除我指的是条目和实体:条目是:(苹果,橙色),实体是:苹果每个条目都有两个元素。 the comma being the separator. 逗号为分隔符。 there should be no duplicate entries or entries. 不应有重复的条目。 if an entry or an element has appeared in xx_0.txt, it cannot appear in yy_0.txt and yy_1.txt if an entry or an element has appeared in yy_0.txt, it cannot appear in xx_0.txt and xx_1.txt if an entry or an element has appeared in xx_1.txt, it cannot appear in yy_0.txt and yy_1.txt if an entry or an element has appeared in yy_1.txt, it cannot appear in xx_0.txt and xx_1.txt 如果条目或元素已出现在xx_0.txt中,则如果它们出现在yy_0.txt中,则不能出现在yy_0.txt和yy_1.txt中,如果条目或元素已出现在xx_0.txt和xx_1.txt中,则不能条目或元素已出现在xx_1.txt中,如果条目或元素已出现在yy_1.txt中,则它不能出现在yy_0.txt和yy_1.txt中,它不能出现在xx_0.txt和xx_1.txt中

each entry is taken one by one and entries are alternately selected for the two files till an entry is written into the file. 每个条目都一个接一个地选取,并为两个文件交替选择条目,直到将条目写入文件中为止。

the expected output is as follows 预期输出如下

the constituent files from CL-0: CL-0的构成文件:

*the xx_0 file should have: (apple, orange) (cake, tea) (milk, honey) * xx_0文件应具有:(苹果,橘子)(蛋糕,茶)(牛奶,蜂蜜)

*the yy_0 file should have: (mango, banana) (coffee, sugar) (cake, biscuts) cannot be added as cake has already appeared in xx_0 * yy_0文件应具有:(芒果,香蕉)(咖啡,糖)(蛋糕,biscuts),因为蛋糕已经出现在xx_0中

the constinuent files from CL-1: CL-1的主要文件:

*the xx_1 file should have: (orange, mango) * a duplicate entry is OK in this case (car, icecream) * xx_1文件应具有:(橙色,芒果) *在这种情况下(汽车,冰淇淋)可以重复输入

*the yy_1 file would have: (grape, coffee) * again a duplicate entry is ok in this case (table, chair) (window, milk) cannot be added here as it would have duplicate entity milk which has already appeared in xx_0 file * yy_1文件将具有:(葡萄,咖啡) *在这种情况下(餐桌,椅子)(窗户,牛奶),同样可以重复输入,因为此处可能已经在xx_0文件中出现了重复的实体牛奶,因此无法在此处添加

I attempted half of the problem thinking that if i can successfully divide the CL-0 file into two parts, the rest could be implemented easily with a bit of tweaking. 我尝试了一半的问题,认为如果我可以成功地将CL-0文件分为两部分,则只需进行一些调整即可轻松实现其余部分。

My effort is as follows: 我的努力如下:

xx_0=open('xx_0.txt','wb') #the file that i want to populate
yy_0=open('yy_0.txt','wb') #the file that i want to populate
file=open('CL-0.txt','r')  # the main file
xx0=set()
xx1=set() # un1 a set against which the desired file has to be checked against for matches
yy0=set()
yy1=set() # un2 a set against which the desired file has to be checked against for matches
for line in file:
    s=line.replace('[,]','')

    s=s.replace('\n','')
    s=s.replace('(','')
    s=s.replace(')','')
    s=s.replace("'",'')

    r=re.split(',',s)
    if L==1:
        for n in r:
            if n not in yy0:
                if n not in yy1:
                    xx0.add(n)
        r1= ', '.join(r)
        xx_0.write(r1)
        xx_0.write('\n')

        L+=1
        continue

    if L==2:
        for n in r:
            if n not in xx_1:
                if n not in yy_1:
                    yy0.add(n)                  
        r2=', '.join(r)
        yy_0.write(r2)
        yy_0.write('\n')
        L=1

Assuming the lines shall be put alternating into two different files: 假设这些行应交替放入两个不同的文件中:

inputFile = file('CL-0.txt')
out = [ file(fileName, 'wb') for fileName in [ 'xx_0.txt', 'yy_0.txt' ] ]
done = set()
for line in inputFile:
  elements = re.match(r'\s*\(\s*([^,])*\s*,\s*([^)])*\)\s*', line)
  if elements in done:
      continue
  out[0].write(', '.join(elements) + '\n')
  done.add(elements)
  out = out[1:] + [ out[0] ]  # round robin
for f in out:
  f.close()

But I did not understand what the purpose of these xx1 and yy1 sets was. 但是我不明白这些xx1yy1集的目的是什么。 Your code definitely wasn't explaining it (it did not write to these at all) and your text wasn't helpful enough either. 您的代码肯定没有解释它(它根本没有写这些东西),并且您的文本也没有足够的帮助。 Maybe you want to elaborate on that? 也许您想对此进行详细说明?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM