I am trying to write a program in python which searches for user specified words in a txt file and copies the selected lines containing that word into another file.
Also the user will have an option to exclude any word.
(eg Suppose the user searches for the word "exception" and want to exclude the word "abc", then the code will only copy the lines which has "exception" in it but not "abc").
Now all the work will be done from the command prompt.
The input would be:
file.py test.txt(input file) test_mod.txt(output file) -e abc(exclude word denoted by -e)-s exception(search word denoted by -s)
Now the user will have an option to enter multiple exclude words and multiple search words.
Now so far I have achieved that the input format is:
file.py test.txt test_mod.txt abc exception".
This excludes the word "abc" and search for "exception".
But I don't know how to:
Please can somebody help me by modifying my code or write a new one?
Here's my code as of now:
#/Python33
import sys
import os
def main(): #main method
try:
f1 = open(sys.argv[1], 'r') #takes the first input file in command line
found = False
user_input1 = (sys.argv[3]) #takes the word which is to be excluded.
user_input2 = (sys.argv[4]) #takes the word which is to be included.
if sys.argv[1] == sys.argv[2]:
f1.close()
sys.exit('\nERROR!!\nThe two file names cannot be the same.')
if sys.argv[3] != sys.argv[4]:
for line in f1:
if user_input1 in line or user_input2 in line:
f2 = open(sys.argv[2], 'a')
if user_input1 in line:
if user_input2 in line:
pass
elif user_input2 in line:
f2.write(line)
found = True
f2.close()
if not found:
print("ERROR: The Word couldn't be found.")
f1.close()
if sys.argv[3] == sys.argv[4]:
f1.close()
sys.exit('\nERROR!!\nThe word to be excluded and the word to be included cannot be the same.')
except IOError:
print('\nIO error or wrong file name.')
except IndexError:
print('\nYou must enter 5 parameters.') #prevents less than 5 inputs which is mandatory
except SystemExit as e: #Exception handles sys.exit()
sys.exit(e)
if __name__ == '__main__':
main()
Thanks man. That really helped me understand the logic. But I'm new to python, so I'm still having some issues.Whenever I run it, it copies the file with the words specified by -s but it's not excluding the words specified by -e. What am I doing wrong? So here's my code now: #/Python33
#takes a text file, finds a word and writes that line containing that word but not a 2nd word specified by the user. So if both of them are there, that line is not printed
import sys
import os
import argparse
def main(): #main method
try:
parser = argparse.ArgumentParser(description='Copies selected lines from files')
parser.add_argument('input_file')
parser.add_argument('output_file')
parser.add_argument('-e',action="append")
parser.add_argument('-s',action="append")
args = parser.parse_args('test.txt, test_mod.txt, -e , -s exception'.split())
user_input1 = (args.e) #takes the word which is to be excluded.
user_input2 = (args.s) #takes the word which is to be included.
def include_exclude(input_file, output_file, exclusion_list=[], inclusion_list=[]):
with open(output_file, 'w') as fo:
with open(input_file, 'r') as fi:
for line in fi:
inclusion_words_in_line = map(lambda x: x in line, inclusion_list)
exclusion_words_in_line = map(lambda x: x in line, exclusion_list)
if any(inclusion_words_in_line) and not any(exclusion_words_in_line):
fo.write(line)
if user_input1 != user_input2 :
include_exclude('test.txt', 'test_mod.txt', user_input1, user_input2);
print("hello")
if user_input1 == user_input2 :
sys.exit('\nERROR!!\nThe word to be excluded and the word to be included cannot be the same.')
except IOError:
print('\nIO error or wrong file name.')
except IndexError:
print('\nYou must enter 5 parameters.')
except SystemExit as e:
sys.exit(e)
if __name__ == '__main__':
main()
I think this does what you want:
»»» import argparse
»»» parser = argparse.ArgumentParser(description='foo baaar')
»»» parser.add_argument('input_file')
Out[3]: _StoreAction(option_strings=[], dest='input_file', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)
»»» parser.add_argument('output_file')
Out[4]: _StoreAction(option_strings=[], dest='output_file', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)
»»» parser.add_argument('-e', action="append")
Out[5]: _AppendAction(option_strings=['-e'], dest='e', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)
»»» parser.add_argument('-s', action="append")
Out[6]: _AppendAction(option_strings=['-s'], dest='s', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)
»»» parser.parse_args('foo1.txt foo2.txt -e abc -e def -s xyz -s pqr'.split())
Out[7]: Namespace(e=['abc', 'def'], input_file='foo1.txt', output_file='foo2.txt', s=['xyz', 'pqr'])
If you just call parser.parse_args()
, it will parse the arguments passed to your script, but the above is handy for testing. Note how multiple search and exclude words are specified using multiple -s
and -e
flags. By passing action="append"
to the add_argument
method, arguments after -s
and -e
are added to a list in the namespace returned by parser.parse_args
. This should address your questions 1.
and 2.
.
Here's an example of how you can access the values in a nice way:
»»» args = parser.parse_args('foo1.txt foo2.txt -e abc -e def -s xyz -s pqr'.split())
»»» args.e
Out[12]: ['abc', 'def']
I used the argparse docs , especially the add_argument method doc is very useful.
EDIT: here's one function that does the inclusion/exclusion logic:
def include_exclude(input_file, output_file, inclusion_list, exclusion_list=[]):
with open(output_file, 'w') as fo:
with open(input_file, 'r') as fi:
for line in fi:
inclusion_words_in_line = map(lambda x: x in line, inclusion_list)
exclusion_words_in_line = map(lambda x: x in line, exclusion_list)
if any(inclusion_words_in_line) and not any(exclusion_words_in_line):
fo.write(line)
The with
statement ensures that the file is properly closed if anything goes wrong (see the doc ). Instead, you could of course use the same open/close code you already have. Indeed, my code doesn't include any error handling, I'll leave that as an exercise for the reader. In the main for
loop, I loop over all the lines in the input file. Then, I look at all the words in inclusion_list, and check if they occur in the line
. The map
function is IMHO an elegant way of doing this; it takes (for example) the words in inclusion_list
, and generates another list by mapping each of the items of inclusion_list
to the function lambda x: x in line
. The function just returns True
if it's input (a word from inclusion_list
appears in the line), so you end up with a list of True/False items. Brief example:
»»» line="foo bar"
»»» words=['foo', 'barz']
»»» map(lambda x: x in line, words)
Out[24]: [True, False]
Now I apply the any
function to check if, well, any of the items in the inclusion_words_in_line
list are True, and to check if none ( not any
) of the items in exclusion_words_in_line are True. If that's the case, the line
is appended to the output file. If you wanted to ensure that all
of the words in inclusion_list
appear on the line, rather than any (this wasn't clear to me from your problem description), you can use the all
function instead.
Note that you can quite easily solve the above with for loops that loop over the inclusion_list
and exclusion_list
s, checking if the items are there, there's no need to use map
and any
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.