[英]Copy selected lines from one file to another
I am trying to write a program in python which searches for user specified words in a txt file and copies the selected lines containing that word into another file. 我试图用python编写程序,该程序在txt文件中搜索用户指定的单词,并将包含该单词的选定行复制到另一个文件中。
Also the user will have an option to exclude any word. 此外,用户可以选择排除任何单词。
(eg Suppose the user searches for the word "exception" and want to exclude the word "abc", then the code will only copy the lines which has "exception" in it but not "abc"). (例如,假设用户搜索单词“ exception”并想排除单词“ abc”,那么代码将仅复制其中包含“ exception”的行,而不是“ abc”的行)。
Now all the work will be done from the command prompt. 现在,所有工作将在命令提示符下完成。
The input would be: 输入为:
file.py test.txt(input file) test_mod.txt(output file) -e abc(exclude word denoted by -e)-s exception(search word denoted by -s)
Now the user will have an option to enter multiple exclude words and multiple search words. 现在,用户可以选择输入多个排除词和多个搜索词。
Now so far I have achieved that the input format is: 到目前为止,我已经实现了输入格式为:
file.py test.txt test_mod.txt abc exception".
This excludes the word "abc" and search for "exception". 这排除了单词“ abc”并搜索“ exception”。
But I don't know how to: 但是我不知道如何:
Please can somebody help me by modifying my code or write a new one? 请有人可以通过修改我的代码或编写新代码来帮助我吗?
Here's my code as of now: 到目前为止,这是我的代码:
#/Python33
import sys
import os
def main(): #main method
try:
f1 = open(sys.argv[1], 'r') #takes the first input file in command line
found = False
user_input1 = (sys.argv[3]) #takes the word which is to be excluded.
user_input2 = (sys.argv[4]) #takes the word which is to be included.
if sys.argv[1] == sys.argv[2]:
f1.close()
sys.exit('\nERROR!!\nThe two file names cannot be the same.')
if sys.argv[3] != sys.argv[4]:
for line in f1:
if user_input1 in line or user_input2 in line:
f2 = open(sys.argv[2], 'a')
if user_input1 in line:
if user_input2 in line:
pass
elif user_input2 in line:
f2.write(line)
found = True
f2.close()
if not found:
print("ERROR: The Word couldn't be found.")
f1.close()
if sys.argv[3] == sys.argv[4]:
f1.close()
sys.exit('\nERROR!!\nThe word to be excluded and the word to be included cannot be the same.')
except IOError:
print('\nIO error or wrong file name.')
except IndexError:
print('\nYou must enter 5 parameters.') #prevents less than 5 inputs which is mandatory
except SystemExit as e: #Exception handles sys.exit()
sys.exit(e)
if __name__ == '__main__':
main()
Thanks man. 谢啦。 That really helped me understand the logic.
这确实帮助我理解了逻辑。 But I'm new to python, so I'm still having some issues.Whenever I run it, it copies the file with the words specified by -s but it's not excluding the words specified by -e.
但是我是python的新手,所以我仍然遇到一些问题。每当我运行它时,它都会使用-s指定的单词复制文件,但不排除-e指定的单词。 What am I doing wrong?
我究竟做错了什么? So here's my code now: #/Python33
所以这是我的代码:#/ Python33
#takes a text file, finds a word and writes that line containing that word but not a 2nd word specified by the user. So if both of them are there, that line is not printed
import sys
import os
import argparse
def main(): #main method
try:
parser = argparse.ArgumentParser(description='Copies selected lines from files')
parser.add_argument('input_file')
parser.add_argument('output_file')
parser.add_argument('-e',action="append")
parser.add_argument('-s',action="append")
args = parser.parse_args('test.txt, test_mod.txt, -e , -s exception'.split())
user_input1 = (args.e) #takes the word which is to be excluded.
user_input2 = (args.s) #takes the word which is to be included.
def include_exclude(input_file, output_file, exclusion_list=[], inclusion_list=[]):
with open(output_file, 'w') as fo:
with open(input_file, 'r') as fi:
for line in fi:
inclusion_words_in_line = map(lambda x: x in line, inclusion_list)
exclusion_words_in_line = map(lambda x: x in line, exclusion_list)
if any(inclusion_words_in_line) and not any(exclusion_words_in_line):
fo.write(line)
if user_input1 != user_input2 :
include_exclude('test.txt', 'test_mod.txt', user_input1, user_input2);
print("hello")
if user_input1 == user_input2 :
sys.exit('\nERROR!!\nThe word to be excluded and the word to be included cannot be the same.')
except IOError:
print('\nIO error or wrong file name.')
except IndexError:
print('\nYou must enter 5 parameters.')
except SystemExit as e:
sys.exit(e)
if __name__ == '__main__':
main()
I think this does what you want: 我认为这可以满足您的需求:
»»» import argparse
»»» parser = argparse.ArgumentParser(description='foo baaar')
»»» parser.add_argument('input_file')
Out[3]: _StoreAction(option_strings=[], dest='input_file', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)
»»» parser.add_argument('output_file')
Out[4]: _StoreAction(option_strings=[], dest='output_file', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)
»»» parser.add_argument('-e', action="append")
Out[5]: _AppendAction(option_strings=['-e'], dest='e', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)
»»» parser.add_argument('-s', action="append")
Out[6]: _AppendAction(option_strings=['-s'], dest='s', nargs=None, const=None, default=None, type=None, choices=None, help=None, metavar=None)
»»» parser.parse_args('foo1.txt foo2.txt -e abc -e def -s xyz -s pqr'.split())
Out[7]: Namespace(e=['abc', 'def'], input_file='foo1.txt', output_file='foo2.txt', s=['xyz', 'pqr'])
If you just call parser.parse_args()
, it will parse the arguments passed to your script, but the above is handy for testing. 如果只调用
parser.parse_args()
,它将解析传递给脚本的参数,但是上面的代码很方便进行测试。 Note how multiple search and exclude words are specified using multiple -s
and -e
flags. 注意如何使用多个
-s
和-e
标志指定多个搜索和排除单词。 By passing action="append"
to the add_argument
method, arguments after -s
and -e
are added to a list in the namespace returned by parser.parse_args
. 通过将
action="append"
传递给add_argument
方法,将-s
和-e
之后的参数添加到parser.parse_args
返回的名称空间的列表中。 This should address your questions 1.
and 2.
. 这应该解决您的问题
1.
和2.
.。
Here's an example of how you can access the values in a nice way: 这是一个如何以一种很好的方式访问值的示例:
»»» args = parser.parse_args('foo1.txt foo2.txt -e abc -e def -s xyz -s pqr'.split())
»»» args.e
Out[12]: ['abc', 'def']
I used the argparse docs , especially the add_argument method doc is very useful. 我使用了argparse文档 ,尤其是add_argument方法文档非常有用。
EDIT: here's one function that does the inclusion/exclusion logic: 编辑:这是一个执行包含/排除逻辑的功能:
def include_exclude(input_file, output_file, inclusion_list, exclusion_list=[]):
with open(output_file, 'w') as fo:
with open(input_file, 'r') as fi:
for line in fi:
inclusion_words_in_line = map(lambda x: x in line, inclusion_list)
exclusion_words_in_line = map(lambda x: x in line, exclusion_list)
if any(inclusion_words_in_line) and not any(exclusion_words_in_line):
fo.write(line)
The with
statement ensures that the file is properly closed if anything goes wrong (see the doc ). with
语句可确保在出现任何问题时正确关闭文件(请参阅doc )。 Instead, you could of course use the same open/close code you already have. 相反,您当然可以使用已经拥有的相同打开/关闭代码。 Indeed, my code doesn't include any error handling, I'll leave that as an exercise for the reader.
确实,我的代码不包含任何错误处理,我将其留给读者练习。 In the main
for
loop, I loop over all the lines in the input file. 在main
for
循环中,我遍历输入文件中的所有行。 Then, I look at all the words in inclusion_list, and check if they occur in the line
. 然后,查看inclusion_list中的所有单词,并检查它们是否出现在该
line
。 The map
function is IMHO an elegant way of doing this; map
功能是恕我直言的一种优雅方式。 it takes (for example) the words in inclusion_list
, and generates another list by mapping each of the items of inclusion_list
to the function lambda x: x in line
. 它需要(例如)词语的
inclusion_list
,和由每个项目的映射产生另一个列表inclusion_list
给函数lambda x: x in line
。 The function just returns True
if it's input (a word from inclusion_list
appears in the line), so you end up with a list of True/False items. 如果输入,该函数将返回
True
(该行中inclusion_list
一个来自inclusion_list
的单词),因此最终得到的是True / False项目列表。 Brief example: 简要示例:
»»» line="foo bar"
»»» words=['foo', 'barz']
»»» map(lambda x: x in line, words)
Out[24]: [True, False]
Now I apply the any
function to check if, well, any of the items in the inclusion_words_in_line
list are True, and to check if none ( not any
) of the items in exclusion_words_in_line are True. 现在我应用
any
功能检查,那么,任何在该项目的inclusion_words_in_line
名单是真实的,并检查无( not any
在exclusion_words_in_line的项目),都是如此。 If that's the case, the line
is appended to the output file. 在这种情况下,该
line
附加到输出文件。 If you wanted to ensure that all
of the words in inclusion_list
appear on the line, rather than any (this wasn't clear to me from your problem description), you can use the all
function instead. 如果你想确保
all
的话inclusion_list
出现在该行,而不是任何(这不是从你的问题描述清楚地知道),你可以使用all
功能来代替。
Note that you can quite easily solve the above with for loops that loop over the inclusion_list
and exclusion_list
s, checking if the items are there, there's no need to use map
and any
. 请注意,您可以使用for循环轻松地解决上述问题,这些循环遍历
inclusion_list
和exclusion_list
,检查是否有项目,不需要使用map
和any
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.