简体   繁体   中英

Copy selected lines from one file to another

I am trying to write a program in python which searches for user specified words in a txt file and copies the selected lines containing that word into another file.

Also the user will have an option to exclude any word.

(eg Suppose the user searches for the word "exception" and want to exclude the word "abc", then the code will only copy the lines which has "exception" in it but not "abc").

Now all the work will be done from the command prompt.

The input would be:

file.py test.txt(input file) test_mod.txt(output file) -e abc(exclude word denoted by -e)-s exception(search word denoted by -s)

Now the user will have an option to enter multiple exclude words and multiple search words.

Now so far I have achieved that the input format is:

file.py test.txt test_mod.txt abc exception".

This excludes the word "abc" and search for "exception".

But I don't know how to:

  1. Include multiple search word and exclude words
  2. How to denote them by -e and -s. I have seen the argparse and the getopt tutorial. But there's no tutorial on this specific topic.

Please can somebody help me by modifying my code or write a new one?

Here's my code as of now:

#/Python33

import sys
import os




def main(): #main method

 try:

  f1 = open(sys.argv[1], 'r')    #takes the first input file in command line
  found = False
  user_input1 = (sys.argv[3])    #takes the word which is to be excluded.
  user_input2 = (sys.argv[4])    #takes the word which is to be included.
  if sys.argv[1] == sys.argv[2]: 
       f1.close()
       sys.exit('\nERROR!!\nThe two file names cannot be the same.') 

  if sys.argv[3] != sys.argv[4]:  

    for line in f1:

        if user_input1 in line or user_input2 in line:

           f2 = open(sys.argv[2], 'a') 

           if user_input1 in line:
              if user_input2 in line:
                   pass

           elif user_input2 in line:
              f2.write(line)
              found = True
              f2.close()


    if not found:
        print("ERROR: The Word couldn't be found.")            



    f1.close()


  if sys.argv[3] == sys.argv[4]: 
         f1.close()
         sys.exit('\nERROR!!\nThe word to be excluded and the word to be included  cannot be the same.') 



 except IOError:
       print('\nIO error or wrong file name.') 
 except IndexError:
       print('\nYou must enter 5 parameters.') #prevents less than 5 inputs which is  mandatory
 except SystemExit as e:                       #Exception handles sys.exit()
       sys.exit(e)


if __name__ == '__main__':
  main()

Thanks man. That really helped me understand the logic. But I'm new to python, so I'm still having some issues.Whenever I run it, it copies the file with the words specified by -s but it's not excluding the words specified by -e. What am I doing wrong? So here's my code now: #/Python33

#takes a text file, finds a word and writes that line containing that word but not a 2nd word specified by the user. So if both of them are there, that line is not printed

import sys
import os
import argparse



def main(): #main method

 try:

  parser = argparse.ArgumentParser(description='Copies selected lines from files')
  parser.add_argument('input_file')
  parser.add_argument('output_file')
  parser.add_argument('-e',action="append")
  parser.add_argument('-s',action="append")
  args = parser.parse_args('test.txt, test_mod.txt, -e , -s exception'.split())


  user_input1 = (args.e)    #takes the word which is to be excluded.
  user_input2 = (args.s)    #takes the word which is to be included.

  def include_exclude(input_file, output_file, exclusion_list=[], inclusion_list=[]):


      with open(output_file, 'w') as fo:
        with open(input_file, 'r') as fi:
            for line in fi:
                inclusion_words_in_line = map(lambda x: x in line, inclusion_list)
                exclusion_words_in_line = map(lambda x: x in line, exclusion_list)
                if any(inclusion_words_in_line) and not any(exclusion_words_in_line):
                    fo.write(line)    
  if user_input1 != user_input2 : 
         include_exclude('test.txt', 'test_mod.txt', user_input1, user_input2);
         print("hello")

  if user_input1 == user_input2 : 


         sys.exit('\nERROR!!\nThe word to be excluded and the word to be included cannot be the same.') 



 except IOError:
       print('\nIO error or wrong file name.')  
 except IndexError:
       print('\nYou must enter 5 parameters.') 
 except SystemExit as e:                      
       sys.exit(e)


if __name__ == '__main__':
  main()

I think this does what you want:

»»» import argparse

»»» parser = argparse.ArgumentParser(description='foo baaar')

»»» parser.add_argument('input_file')
Out[3]: _StoreAction(option_strings=[], dest='input_file', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)

»»» parser.add_argument('output_file')
Out[4]: _StoreAction(option_strings=[], dest='output_file', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)

»»» parser.add_argument('-e', action="append")
Out[5]: _AppendAction(option_strings=['-e'], dest='e', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)

»»» parser.add_argument('-s', action="append")
Out[6]: _AppendAction(option_strings=['-s'], dest='s', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)

»»» parser.parse_args('foo1.txt foo2.txt -e abc -e def -s xyz -s pqr'.split())
Out[7]: Namespace(e=['abc', 'def'], input_file='foo1.txt', output_file='foo2.txt', s=['xyz', 'pqr'])

If you just call parser.parse_args() , it will parse the arguments passed to your script, but the above is handy for testing. Note how multiple search and exclude words are specified using multiple -s and -e flags. By passing action="append" to the add_argument method, arguments after -s and -e are added to a list in the namespace returned by parser.parse_args . This should address your questions 1. and 2. .

Here's an example of how you can access the values in a nice way:

»»» args = parser.parse_args('foo1.txt foo2.txt -e abc -e def -s xyz -s pqr'.split())

»»» args.e
Out[12]: ['abc', 'def']

I used the argparse docs , especially the add_argument method doc is very useful.

EDIT: here's one function that does the inclusion/exclusion logic:

def include_exclude(input_file, output_file, inclusion_list, exclusion_list=[]):
    with open(output_file, 'w') as fo:
        with open(input_file, 'r') as fi:
            for line in fi:
                inclusion_words_in_line = map(lambda x: x in line, inclusion_list)
                exclusion_words_in_line = map(lambda x: x in line, exclusion_list)
                if any(inclusion_words_in_line) and not any(exclusion_words_in_line):
                    fo.write(line)

The with statement ensures that the file is properly closed if anything goes wrong (see the doc ). Instead, you could of course use the same open/close code you already have. Indeed, my code doesn't include any error handling, I'll leave that as an exercise for the reader. In the main for loop, I loop over all the lines in the input file. Then, I look at all the words in inclusion_list, and check if they occur in the line . The map function is IMHO an elegant way of doing this; it takes (for example) the words in inclusion_list , and generates another list by mapping each of the items of inclusion_list to the function lambda x: x in line . The function just returns True if it's input (a word from inclusion_list appears in the line), so you end up with a list of True/False items. Brief example:

»»» line="foo bar"

»»» words=['foo', 'barz']

»»» map(lambda x: x in line, words)
Out[24]: [True, False]

Now I apply the any function to check if, well, any of the items in the inclusion_words_in_line list are True, and to check if none ( not any ) of the items in exclusion_words_in_line are True. If that's the case, the line is appended to the output file. If you wanted to ensure that all of the words in inclusion_list appear on the line, rather than any (this wasn't clear to me from your problem description), you can use the all function instead.

Note that you can quite easily solve the above with for loops that loop over the inclusion_list and exclusion_list s, checking if the items are there, there's no need to use map and any .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM