[英]Remove line from file if containing word from another .txt file in python/bash
我正在學習python,然后遇到以下困難。 我要清除的文件是.csv文件。 包含必須從.csv文件中刪除的單詞的文件是.txt。.txt文件是域名列表:
domain.com
domain2.com
domain3.com
.csv文件是一個配置文件,如下所示:
domain.com;8;Started;C:\inetpub\wwwroot\d\domain.com;"http *:80:www.domain.com"
如果.txt文件包含“ domain.com”,我希望刪除上面的完整行。 如果某些python忍者可以解決此問題,我將非常感激。(或使用bash嗎?)
這樣就足夠了嗎?
import sys
def main():
with open(sys.argv[1]) as fh:
fhDomains = fh.read().split(";")
with open(sys.argv[2]) as fh:
fhExcludes = fh.read().split("\n")
for i, dom in enumerate(fhDomains):
if dom in fhExcludes:
del fhDomains[i]
fh = open(sys.argv[1], "w")
fh.write(";".join(fhDomains))
if __name__ == "__main__":
main()
執行:
script.py Domains.txt excludes.txt
嘗試:
grep -vf <(sed 's/.*/^&;/' domains.txt) file.csv
@glenn jackman的建議-簡短一點。
grep -wFvf domains.txt file.csv
但是,域中的foo.com
將使兩行都匹配(一個不希望出現),例如:
foo.com;.....
other.foo.com;.....
洙...
我的domains.txt
dom1.com
dom3.com
我的file.csv(僅需要第一列)
dom1.com;wedwedwe
dom2.com;wedwedwe 2222
dom3.com;wedwedwe 333
dom4.com;wedwedwe 444444
結果:
dom2.com;wedwedwe 2222
dom4.com;wedwedwe 444444
如果您有Windows文件-行以\\r\\n
結尾,而不僅以\\n
結尾,請使用:
grep -vf <(<domains.txt tr -d '\r' |sed -e 's/.*/^&;/') file.csv
好吧,因為OP正在學習python ...
$ python SCRIPT.py
TXT_file = 'TXT.txt'
CSV_file = 'CSV.csv'
OUT_file = 'OUTPUT.csv'
## From the TXT, create a list of domains you do not want to include in output
with open(TXT_file, 'r') as txt:
domain_to_be_removed_list = []
## for each domain in the TXT
## remove the return character at the end of line
## and add the domain to list domains-to-be-removed list
for domain in txt:
domain = domain.rstrip()
domain_to_be_removed_list.append(domain)
with open(OUT_file, 'w') as outfile:
with open(CSV_file, 'r') as csv:
## for each line in csv
## extract the csv domain
for line in csv:
csv_domain = line.split(';')[0]
## if csv domain is not in domains-to-be-removed list,
## then write that to outfile
if (not csv_domain in domain_to_be_removed_list):
outfile.write(line)
這個awk
單線應該可以解決問題:
awk -F';' 'NR == FNR {a[$1]++; next} !($1 in a)' txtfile csvfile
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.