简体   繁体   中英

Pandas: New column value includes entire row

I have the following code that is going row by row and translating a specific column into English from my dataframe, but when I run it, the resulting new column 'translatedv4'. I am new to looping through entire dataframes rather than lists so that may be the issue

Example of a single value (I just want the column to show "I'm thinking this...")

Comments            Ich glaube das...
Translations                                                       DE  
Race / Ethnicity                                                White
Count2                                                             91
translated          I'm thinking this because I'm nearing retireme...

Current code:

from googletrans import Translator
import pandas as pd
import xlsxwriter
import xlrd
import copy

##################TRANSLATION

translator = Translator()
file = r"xxxx"
#dt2 = translator.detect(text2)

df = pd.read_excel(file, sheet_name = 'Sheet1', converters={'Comments':str}).fillna(0)

df = df[df['Comments'] != 0]


translatedList = []
for index, row in df.iterrows():
    # REINITIALIZE THE API
    translator = Translator()
    newrow = copy.deepcopy(row)
    try:
        # translate the 'text' column
        translated = translator.translate(row['Comments'], dest='en')
        newrow['translated'] = translated.text
    except Exception as e:
        print(str(e))
        continue
    translatedList.append(newrow)
df = df.assign(translatedv4 = translatedList) 

I'm not quite sure about your problem, so I hope this is what you're looking for. I do think you're not approaching it in the best way however. Generally with pandas, you'll want to try to vectorize your solutions or create a function that you pass to df.apply . Here are three solutions with increasing complexity. The first one uses a lambda function, which works but it doesn't handle exceptions. The second one creates a normal function, which allows us to do that easily. The last solution ratelimit and tqdm which are nice when working with API's and dataframes.

Solution 1, without exception handler

from googletrans import Translator
import pandas as pd

df = pd.DataFrame({
    'German': ['ich glaube das', 'schadenfreude', 'schnappsidee']
})

translator = Translator()

df['English'] = df['German'].apply(
    lambda sent: translator.translate(sent, dest='en', src='de').text
)

print(df)

           German         English
0  ich glaube das  I believe that
1   schadenfreude   malicious joy
2    schnappsidee   snapping idea

Solution 2, with exception handler

from googletrans import Translator
import pandas as pd

def get_trans(sent):
    try:
        return translator.translate(sent, dest='en', src='de').text
    except Exception as e:
        print(e)
        return np.nan

df = pd.DataFrame({
    'German': ['ich glaube das', 'schadenfreude', 'schnappsidee', np.nan]
})

translator = Translator()

df['English'] = df['German'].apply(get_trans)

print(df)

'float' object is not iterable
           German         English
0  ich glaube das  I believe that
1   schadenfreude   malicious joy
2    schnappsidee   snapping idea
3             NaN             NaN

Solution 3, with ratelimit and tqdm

When working with API's, I can really recommend the fantastic ratelimit library. It can help you not ask for too many requests, and handle exceptions. I also added tqdm for a progress bar. This is nice if you have a lot of data.

from googletrans import Translator
import pandas as pd
from ratelimit import limits, sleep_and_retry
from tqdm.autonotebook import tqdm
# from tqdm import tqdm  <- use this instead if you're not using jupyter

FIFTEEN_MINUTES = 900

tqdm.pandas()

@sleep_and_retry
@limits(calls=15, period=FIFTEEN_MINUTES)
def get_trans(sent):
    try:
        return translator.translate(sent, dest='en', src='de').text
    except Exception as e:
        print(e)
        return np.nan

df = pd.DataFrame({
    'German': ['ich glaube das', 'schadenfreude', 'schnappsidee', np.nan]
})

translator = Translator()

df['English'] = df['German'].progress_apply(get_trans)

print(df)

           German         English
0  ich glaube das  I believe that
1   schadenfreude   malicious joy
2    schnappsidee   snapping idea
3             NaN             NaN

I think you have a small mistake in your code, here:

translatedList.append(newrow)

you append full row to your list, while you want to append the new value, ie

translatedList.append(translated.text)

But be careful, in case of any exception lenght of translatedList will be less than length of your DataFrame index. Probably you should do something like this:

try:
    # translate the 'text' column
    translated = translator.translate(row['Comments'], dest='en')
    translatedList.append(translated.text)
except Exception as e:
    print(str(e))
    translatedList.append('ERRROR')
    continue

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM