简体   繁体   中英

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) error occurs sometimes using googletrans and Pandas in Python

I have the following code that sometimes works and sometimes doesn't.. I haven't figured out any patterns as to why the code wouldn't work, but I have confirmed the following:

  • The background data has ~1000 rows
  • All values in df['Comments'] are NOT null

I believe it has something to do with tapping into the Google API, but I do not know. Does anyone know what the issue is here? If not, are there alternatives out there for language translation in python?

Error:

    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Code:

from googletrans import Translator
import pandas as pd
import xlsxwriter
import xlrd
import copy

##################TRANSLATION

translator = Translator()
file = r"xxxx"
#dt2 = translator.detect(text2)

df = pd.read_excel(file, sheet_name = 'Sheet1', converters={'Comments':str}).fillna(0)

df = df[df['Comments'] != 0]


translatedList = []
for index, row in df.iterrows():
    # REINITIALIZE THE API
    translator = Translator()
    newrow = copy.deepcopy(row)
    try:
        # translate the 'text' column
        translated = translator.translate(row['Comments'], dest='en')
        newrow['translated'] = translated.text
    except Exception as e:
        print(str(e))
        continue
    translatedList.append(translated.text)
df = df.assign(translatedv4 = translatedList) 

#translator = Translator()

#df['English'] = df['Comments'].apply(translator.translate,dest='en').apply(getattr, args=('text',)) #MAY RUN INTO LIMITS


#df['Translated_Python'] = df['Comments'].map(lambda x: translator.translate(x, src="de", dest="en").text)
#print(df['Translated_Python'])

#s = fuzz.ratio("Wow year what a ","This is a test")
#print(s)

end = r"xxxx"

writer = pd.ExcelWriter(end, engine='xlsxwriter')
df.to_excel(writer, sheet_name='Data')
translatedList.to_excel(writer, sheet_name='List')
writer.save()

Traceback: 在此处输入图像描述

Option 2: 在此处输入图像描述

The issue here is that this library sends an API call to the Google Translate service for every translate call. You're wrapping the translate call in an apply , so you're slamming the Translate API with a call for every row. The library expects to get back JSON, but when it gets back an error from Google, like 429 Too Many Requests , this can't be parsed into JSON .

If you want to do bulk translation, you need to use the libraries recommended approach:

https://github.com/ssut/py-googletrans#advanced-usage-bulk

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM