简体   繁体   中英

Get some string before " not in all lines python

I have such entries in a txt file with such structure:

Some sentence.
Some other "other" sentence.
Some other smth "other" sentence.

In original:

Камиш-Бурунський залізорудний комбінат
Відкрите акціонерне товариство "Кар'єр мармуровий"
Закрите акціонерне товариство "Кар'єр мармуровий"

I want to extract everything before " and write to another file. I want the result to be:

Some other
Some other smth
Відкрите акціонерне товариство
Закрите акціонерне товариство

I have done this:

f=codecs.open('organization.txt','r+','utf-8')
text=f.read()
words_sp=text.split()
for line in text:
    before_keyword, after_keyword = line.split(u'"',1)
    before_word=before_keyword.split()[0]
    encoded=before_word.encode('cp1251')
    print encoded

But it doesn't work since there is a file lines that doesn't have " . How can I improve my code to make it work?

There are two problems. First you must use the splitlines() function to break a string into lines. (What you have will iterate one character at a time.) Secondly, the following code will fail when split returns a single item:

before_keyword, after_keyword = line.split(u'"',1)

The following works for me:

for line in text.splitlines():
    if u'"' in line:
        before_keyword, after_keyword = line.split(u'"',1)
        ... etc. ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM