[英]python replace ; with , if line starts with keyword
I have a textfile containing thousands of entries like: 我有一个包含数千个条目的文本文件,例如:
@INBOOK{Abu-Lughod1991,
chapter = {Writing against culture},
pages = {137-162},
title = {Recapturing anthropology},
publisher = {School of American Research Press},
year = {1991},
editor = {Richard Fox},
author = {Abu-Lughod, Lila},
address = {Santa Fe /NM},
abstract = {Im Zusammenhang mit der Debatte um die writing culture fomuliert AL
eine feministische Kritik und zeigt, wie von dort doch Anregungen
für die Reflektion der Schreibweise und Repräsentation gekommen sind.*},
crossref = {Rabinow1986},
keywords = {Frauen; Feminismus; Erzählung als EG; Repräsentation; Roman; Schreibtechnik;
James Clifford; writing culture; Dialog;},
owner = {xko},
systematik1 = {Anth\theor\Ethnographie},
systematik2 = {Anth\theor\Text & Ges},
timestamp = {1995-12-02}
}
I will replace all semicolons in the keywords - field to comma. 我将替换关键字-逗号中的所有分号。 But only in the keywords field - other fields should be untouched:
但仅在关键字字段中-其他字段应保持不变:
keywords = {Frauen, Feminismus, Erzählung als EG, Repräsentation, Roman, Schreibtechnik, James Clifford, writing culture, Dialog,},
I'm not a programmer and maybe the following code-snippet is a good starting point and i would really appreciate if someone could complete it. 我不是程序员,也许以下代码段是一个不错的起点,如果有人能完成它,我将不胜感激。
outfile = open("literatur_comma.txt", "w")
for line in open("literatur_semicolon.txt", "r"):
if line # starts with "keywords" replace all semicolon with comma
outfile.write(line) # write in new file
outfile.close()
Thanks a lot! 非常感谢!
EDIT: Thanks for all your answers and codes, that's great! 编辑:感谢您的所有答案和代码,太好了! I did a mistake in my thoughts and if i use my code-wrapper (with outfile), then it creates a new file with the keywords in it.
我的想法有误,如果我使用代码包装器(带有outfile),则会创建一个包含关键字的新文件。 How can i use the same file and replaces only the semicolons to comma in keywords line?
我如何使用同一文件,并仅将分号替换为关键字行中的逗号?
Something like this works for a single line. 这样的事情只适用于一行。
if line.strip().startswith('keywords'):
line = line.replace(';',',')
outfile.write(line)
If keywords spans multiple lines in your actual text file though, this won't get the job done. 但是,如果关键字在您的实际文本文件中跨多行,则将无法完成工作。
outfile = open("literatur_comma.txt", "w")
for line in open("literatur_semicolon.txt", "r"):
if line.startswith('keywords'): # starts with "keywords" replace all semicolon with comma
outfile.write(line.replace(';',',')) # write in new file
outfile.close()
using pyparsing 使用pyparsing
Note: this is one way to do it, but brain isn't in parsing mode - so this is an idea rather than a proper answer... It certainly needs some work, but might well be the right direction... 注意:这是执行此操作的一种方法,但是大脑不在解析模式下-因此这是一个主意,而不是一个正确的答案...它当然需要做一些工作,但很可能是正确的方向...
A somewhat messy example of using pyparsing
... (could be much nicer, with some @INBOOK and wotsit checking and parsing, but anyway...) 使用
pyparsing
一个有点混乱的示例...(可能会更好一些,有一些@INBOOK和wotsit检查和解析,但是无论如何...)
from pyparsing import *
keywords = originalTextFor(Keyword('keywords') + '=')
values = delimitedList(Regex('[^;}]+'), ';')
values.setParseAction(lambda L: ', '.join(L))
Where text
is your example: text
是您的示例:
>>> print values.transformString(text)
@INBOOK{Abu-Lughod1991,
chapter = {Writing against culture},
pages = {137-162},
title = {Recapturing anthropology},
publisher = {School of American Research Press},
year = {1991},
editor = {Richard Fox},
author = {Abu-Lughod, Lila},
address = {Santa Fe /NM},
abstract = {Im Zusammenhang mit der Debatte um die writing culture fomuliert AL
eine feministische Kritik und zeigt, wie von dort doch Anregungen
für die Reflektion der Schreibweise und Repräsentation gekommen sind.*},
crossref = {Rabinow1986},
keywords = {Frauen, Feminismus, Erzählung als EG, Repräsentation, Roman, Schreibtechnik, James Clifford, writing culture, Dialog;},
owner = {xko},
systematik1 = {Anth heor\Ethnographie},
systematik2 = {Anth heor\Text & Ges},
timestamp = {1995-12-02}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.