使用 numpy.vectorize 或 DataFrame.apply 通过列列表传递函数？

Question

I've got the following data frame我有以下数据框

df = pd.DataFrame(data= {'Product_JP': ['ﾄﾏﾄｺ- ｻﾙｻ C225G','ﾏﾄｹﾁﾔﾂﾌﾟ','ﾄﾏﾄｹﾁﾔﾂﾌﾟﾊﾞﾘﾕ-','ｹﾁﾔﾂﾌﾟﾊ-ﾌ','ﾄﾏﾄｹﾁﾔﾂﾌﾟﾌﾟﾚﾐｱﾑ'],
                  'Value1': [1,12313,1.123,0.112,0],
                  'Metric1_JP': ['ﾏ-ｹｯﾄｻｲｽﾞ(販売金額(x1000))','加重販売率(販売金額)','ｱｲﾃﾑ販売店当り(販売個数)','加重販売率(販売金額)','加重販売率(販売金額)'],
                  'Type_JP': ['サルサソ−ス','ケチャップ','ケチャップ','ケチャップ','ケチャップ'],
                  'SKU': [4582152498325,4582112498325,4500152498325,4582112398325,4582152483125]},
                 )


        Product_JP     Value1              Metric1_JP Type_JP            SKU
0  ﾄﾏﾄｺ- ｻﾙｻ C225G      1.000  ﾏ-ｹｯﾄｻｲｽﾞ(販売金額(x1000))  サルサソ−ス  4582152498325
1         ﾏﾄｹﾁﾔﾂﾌﾟ  12313.000             加重販売率(販売金額)   ケチャップ  4582112498325
2   ﾄﾏﾄｹﾁﾔﾂﾌﾟﾊﾞﾘﾕ-      1.123         ｱｲﾃﾑ販売店当り(販売個数)   ケチャップ  4500152498325
3        ｹﾁﾔﾂﾌﾟﾊ-ﾌ      0.112             加重販売率(販売金額)   ケチャップ  4582112398325
4  ﾄﾏﾄｹﾁﾔﾂﾌﾟﾌﾟﾚﾐｱﾑ      0.000             加重販売率(販売金額)   ケチャップ  4582152483125

And I can apply the following function using df.apply()我可以使用df.apply()应用以下函数

from deep_translator import (GoogleTranslator)
df['Product_EN'] = df['Product_JP'].apply(lambda row:GoogleTranslator(source='ja', target='en').translate(row))

        Product_JP     Value1              Metric1_JP Type_JP            SKU  \
0  ﾄﾏﾄｺ- ｻﾙｻ C225G      1.000  ﾏ-ｹｯﾄｻｲｽﾞ(販売金額(x1000))  サルサソ−ス  4582152498325   
1         ﾏﾄｹﾁﾔﾂﾌﾟ  12313.000             加重販売率(販売金額)   ケチャップ  4582112498325   
2   ﾄﾏﾄｹﾁﾔﾂﾌﾟﾊﾞﾘﾕ-      1.123         ｱｲﾃﾑ販売店当り(販売個数)   ケチャップ  4500152498325   
3        ｹﾁﾔﾂﾌﾟﾊ-ﾌ      0.112             加重販売率(販売金額)   ケチャップ  4582112398325   
4  ﾄﾏﾄｹﾁﾔﾂﾌﾟﾌﾟﾚﾐｱﾑ      0.000             加重販売率(販売金額)   ケチャップ  4582152483125   

             Product_EN  
0  Tomatoco-Salsa C225G  
1               Matthew  
2          Tomato miser  
3                 Catch  
4  Tomato miser premium

But what I want to do is to pass a list of columns to apply in one go like so但我想要做的是传递一个列列表，像这样一次性应用

JP_columns = [column for column in df.columns if '_JP' in column]
EN_columns = [column.replace('_JP', '_EN') for column in JP_columns]

df[EN_columns] = df[JP_columns].apply(lambda row:GoogleTranslator(source='ja', target='en').translate(row))

This returns a ValueError: "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."这将返回 ValueError：“系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。”

What am I doing wrong with df.apply()我对df.apply()做错了什么
Would this be better done using np.vectorize ?使用np.vectorize会更好吗？

for example (Also returns a Value Error: "The truth value of a DataFrame is ambiguous")例如（还返回值错误：“DataFrame 的真值不明确”）

df[EN_columns] = np.vectorize(GoogleTranslator(source='ja', target='en').translate(df[JP_columns]))

Thanks谢谢

Answer 1

Series.apply applies the function to each cell (row) in the Series since there is a single dimension. Series.apply将函数应用于系列中的每个单元格（行），因为只有一个维度。 However, DataFrame.apply passes the entire column to the function by default.但是，默认情况下， DataFrame.apply将整个列传DataFrame.apply函数。 However, translate expects text not a collection.但是， translate需要text而不是集合。

The function to apply a function to each cell in a DataFrame is applymap and can be used as such:将函数应用于 DataFrame 中的每个单元格的函数是applymap ，可以这样使用：

JP_columns = [column for column in df.columns if '_JP' in column]
EN_columns = [column.replace('_JP', '_EN') for column in JP_columns]

# apply to all cells in the DataFrame
df[EN_columns] = df[JP_columns].applymap(
    GoogleTranslator(source='ja', target='en').translate
)

np.vectorize can also work, note it takes a pyfunc as input in this case translate and returns a callable : np.vectorize也可以工作，注意在这种情况下它需要一个pyfunc作为输入translate并返回一个callable ：

JP_columns = [column for column in df.columns if '_JP' in column]
EN_columns = [column.replace('_JP', '_EN') for column in JP_columns]

# vectorize function then call function on DataFrame
df[EN_columns] = np.vectorize(
    GoogleTranslator(source='ja', target='en').translate
)(df[JP_columns])

Either approach results in df :两种方法都会导致df ：

Product_JP产品_JP	Value1值1	Metric1_JP指标1_JP	Type_JP类型_JP	SKU单品	Product_EN产品_CN	Metric1_EN指标1_EN	Type_EN类型_EN
ﾄﾏﾄｺ- ｻﾙｻ C225Gﾄﾏﾄｺ- ｻﾙｻ C225G	1 1	ﾏ-ｹｯﾄｻｲｽﾞ(販売金額(x1000))ﾏ-ｯﾄｻｲ水ﾞ(贩売金额(x1000))	サルサソ−スサルサソ−ス	4582152498325 4582152498325	Tomatoco-Salsa C225G番茄莎莎酱 C225G	Market size (sales amount (x1000))市场规模（销售额（x1000））	Salsa source莎莎源
ﾏﾄｹﾁﾔﾂﾌﾟﾏﾄｹﾁﾔﾂﾌﾟ	12313 12313	加重販売率(販売金額)当代贩売率(贩売金额)	ケチャップケチャップ	4582112498325 4582112498325	Matthew马修	Weighted sales rate (sales amount)加权销售率（销售额）	ketchup番茄酱
ﾄﾏﾄｹﾁﾔﾂﾌﾟﾊﾞﾘﾕ-ﾄﾏﾄｹﾁﾔﾂﾌﾟﾊﾞﾘﾕ-	1.123 1.123	ｱｲﾃﾑ販売店当り(販売個数)ｱｲﾃﾑ贩売店当り(贩売个数)	ケチャップケチャップ	4500152498325 4500152498325	Tomato miser番茄吝啬鬼	Per item store (number of units sold)每件商店（售出的单位数）	ketchup番茄酱
ｹﾁﾔﾂﾌﾟﾊ-ﾌｹﾁﾔﾂﾌﾟﾊ-ﾌ	0.112 0.112	加重販売率(販売金額)当代贩売率(贩売金额)	ケチャップケチャップ	4582112398325 4582112398325	Catch抓住	Weighted sales rate (sales amount)加权销售率（销售额）	ketchup番茄酱
ﾄﾏﾄｹﾁﾔﾂﾌﾟﾌﾟﾚﾐｱﾑﾄﾏﾄｹﾁﾔﾂﾌﾟﾌﾟﾚﾐｱﾑ	0 0	加重販売率(販売金額)当代贩売率(贩売金额)	ケチャップケチャップ	4582152483125 4582152483125	Tomato miser premium番茄吝啬鬼溢价	Weighted sales rate (sales amount)加权销售率（销售额）	ketchup番茄酱

Setup and imports:设置和导入：

import numpy as np  # only for np.vectorize
import pandas as pd
from deep_translator import GoogleTranslator

df = pd.DataFrame({
    'Product_JP': ['ﾄﾏﾄｺ- ｻﾙｻ C225G', 'ﾏﾄｹﾁﾔﾂﾌﾟ', 'ﾄﾏﾄｹﾁﾔﾂﾌﾟﾊﾞﾘﾕ-', 'ｹﾁﾔﾂﾌﾟﾊ-ﾌ',
                   'ﾄﾏﾄｹﾁﾔﾂﾌﾟﾌﾟﾚﾐｱﾑ'],
    'Value1': [1, 12313, 1.123, 0.112, 0],
    'Metric1_JP': ['ﾏ-ｹｯﾄｻｲｽﾞ(販売金額(x1000))', '加重販売率(販売金額)',
                   'ｱｲﾃﾑ販売店当り(販売個数)', '加重販売率(販売金額)',
                   '加重販売率(販売金額)'],
    'Type_JP': ['サルサソ−ス', 'ケチャップ', 'ケチャップ', 'ケチャップ', 'ケチャップ'],
    'SKU': [4582152498325, 4582112498325, 4500152498325, 4582112398325,
            4582152483125]
})

使用 numpy.vectorize 或 DataFrame.apply 通过列列表传递函数？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-10-13 05:46:53

使用 numpy.vectorize 或 DataFrame.apply 通过列列表传递函数？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-10-13 05:46:53

解决方案1
1 已采纳 2021-10-13 05:46:53