I have the following problem I am trying to solve. I have a list of iso-639 languages that I retreived using langdetect
with the following code
def try_detect(cell):
try:
detected_lang = detect(cell)
except:
detected_lang = None
return detected_lang
Spotify['language'] = Spotify['artists'].apply(try_detect)
Spotify['language'] = Spotify['language'].str.upper()
Spotify['language'].unique()
which returned
array(['DE', 'PL', 'ES', 'EN', 'NL', 'TR', 'FR', 'IT', 'SK', 'RO', 'SW',
'FI', 'AF', 'EL', 'ID', 'LT', 'CA', 'TL', 'PT', 'HR', 'RU', 'NO',
'DA', 'SL', 'CY', 'SQ', 'KO', 'SO', 'CS', 'ET', 'ZH-CN', 'SV',
'HU', 'LV', 'VI', 'JA', None, 'AR', 'TH', 'BG'], dtype=object)
Although that would be sufficient, I'd love to have the full language name in another column. But, I do not seem to be able to get this right. I know that
pycountry.languages.get(alpha_2='FR').name
returns French
. I this tried:
Languages = Spotify['language'].unique()
LANG = []
for lang in Languages:
Lang = pycountry.languages.get(alpha_2=lang).name
LANG.append(Lang)
but I keep getting the error:
AttributeError: 'NoneType' object has no attribute 'name'
I'm at a loss there. Any help to put me on the right track would be greatl appriciated.
I have managed to answer this question by noticing that not all unique values of Spotify['language'].unique()
actually were iso-369 language codes. I this replaced
Languages = Spotify['language'].unique()
LANG = []
for lang in Languages:
Lang = pycountry.languages.get(alpha_2=lang).name
LANG.append(Lang)
by
LANG = []
for lang in Languages:
try:
Lang = pycountry.languages.get(alpha_2=lang).name
except:
Lang = None
LANG.append(Lang)
An alternative solution was offered by @cs95 (Thanks a lot) in the comment above, as
Languages = Spotify['language'].dropna().unique()
both return
['German',
'Polish',
'Spanish',
'English',
'Dutch',
'Turkish',
'French',
'Italian',
'Slovak',
'Romanian',
'Swahili (macrolanguage)',
'Finnish',
'Afrikaans',
'Modern Greek (1453-)',
'Indonesian',
'Lithuanian',
'Catalan',
'Tagalog',
'Portuguese',
'Croatian',
'Russian',
'Norwegian',
'Danish',
'Slovenian',
'Welsh',
'Albanian',
'Korean',
'Somali',
'Czech',
'Estonian',
None,
'Swedish',
'Hungarian',
'Latvian',
'Vietnamese',
'Japanese',
None,
'Arabic',
'Thai',
'Bulgarian']
Note that ZH-CN
is not found. This has to be done manually:
d = {'Language':Languages, 'Language_name':LANG}
LANGUAGE_NAMES = pd.DataFrame(d)
LANGUAGE_NAMES['Language_name'] = np.where(LANGUAGE_NAMES['Language'] == 'ZH-CN', 'Chinese', LANGUAGE_NAMES['Language_name'])
which gives
Language Language_name
0 DE German
1 PL Polish
2 ES Spanish
3 EN English
4 NL Dutch
5 TR Turkish
6 FR French
7 IT Italian
8 SK Slovak
9 RO Romanian
10 SW Swahili (macrolanguage)
11 FI Finnish
12 AF Afrikaans
13 EL Modern Greek (1453-)
14 ID Indonesian
15 LT Lithuanian
16 CA Catalan
17 TL Tagalog
18 PT Portuguese
19 HR Croatian
20 RU Russian
21 NO Norwegian
22 DA Danish
23 SL Slovenian
24 CY Welsh
25 SQ Albanian
26 KO Korean
27 SO Somali
28 CS Czech
29 ET Estonian
30 ZH-CN None
31 SV Swedish
32 HU Hungarian
33 LV Latvian
34 VI Vietnamese
35 JA Japanese
36 None None
37 AR Arabic
38 TH Thai
39 BG Bulgarian
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.