I have such entries in a txt file with such structure:
Some sentence.
Some other "other" sentence.
Some other smth "other" sentence.
In original:
Камиш-Бурунський залізорудний комбінат
Відкрите акціонерне товариство "Кар'єр мармуровий"
Закрите акціонерне товариство "Кар'єр мармуровий"
I want to extract everything before " and write to another file. I want the result to be:
Some other
Some other smth
Відкрите акціонерне товариство
Закрите акціонерне товариство
I have done this:
f=codecs.open('organization.txt','r+','utf-8')
text=f.read()
words_sp=text.split()
for line in text:
before_keyword, after_keyword = line.split(u'"',1)
before_word=before_keyword.split()[0]
encoded=before_word.encode('cp1251')
print encoded
But it doesn't work since there is a file lines that doesn't have " . How can I improve my code to make it work?
There are two problems. First you must use the splitlines()
function to break a string into lines. (What you have will iterate one character at a time.) Secondly, the following code will fail when split returns a single item:
before_keyword, after_keyword = line.split(u'"',1)
The following works for me:
for line in text.splitlines():
if u'"' in line:
before_keyword, after_keyword = line.split(u'"',1)
... etc. ...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.