I've got two lists. One is a list of languages, and the second is a list of strings. I would like to search if any language exists in the list of text and append it (the found language) to a new list, otherwise, append "english"
to that new list instead.
languages = ['afrikaans', 'russian', 'amharic', 'japanese', 'armenian', 'polish', ...]
texts = ['apple', 'orange in polish', 'grape in russian']
The desired output:
['english', 'polish', 'russian']
I first tried these lines but it returns ['polish', 'russian']
!
list_of_valid_langs = []
for lang in langs:
for text in texts:
if lang in text:
list_of_valid_langs.append(lang)
For my second attempt, I added a second condition but this is not what I need
list_of_valid_langs = []
for lang in langs:
for text in texts:
if lang in text:
list_of_valid_langs.append(lang)
elif lang not in text:
list_of_valid_langs.append('english')
I think your mistake is first iterating over the langueges, and then iterating over the texts. Let's try flipping it around:
for text in texts:
for lang in langs:
if lang in text:
list_of_valid_langs.append(lang)
break # lang is found, no need to keep searching
else: # if no lang was found, append 'english'
list_of_valid_langs.append('english')
After seeing @fsimonjetz's answer, I found a better sulotion using sets:
# first of all, turn langs into a set
langs = set(langs)
# iterate over the texts
for text in texts:
# check if one of the words in the text is a language
for word in text.split():
if word in langs:
# if a language is found, append it and break
list_of_valid_langs.append(word)
break
else:
# if no language is found, append 'english'
list_of_valid_langs.append('english')
A note about for else
: The code in the for loop runs as usual, but the code in the else block only runs if the for loop exited normally. Another way to look at this is that the else block only runs if the break statement wasn't reached.
If you want, you can replace the for else
with a normal for loop followed by an if block by using a bool
variable.
I think Roy Cohen's answer is the perfect solution to your problem, but I'd like to suggest an alternative using set intersection that would be more efficient:
languages = set(['afrikaans', 'russian', 'amharic', 'japanese', 'armenian', 'polish'])
texts = ['apple', 'orange in polish', 'grape in russian']
list_of_valid_langs = []
for t in texts:
# this will return the set of the language(s) occurring in the
# string if there are any, otherwise it returns {'english'}
lang = set(t.split()).intersection(languages) or {'english'}
# pop the element from the set and append to the list
list_of_valid_langs.append(lang.pop())
This should work:
for text in texts:
lang_found = False
for lang in langs:
if lang in text:
list_of_valid_langs.append(lang)
lang_found = True
if not lang_found:
list_of_valid_langs.append('english')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.