[英]How do you iterate a function over uneven columns in Python?
I know this question may not make much sense, but hopefully the following example will clarify it. 我知道这个问题可能没有多大意义,但希望以下示例可以阐明这一问题。 I need to reference one string in column sentA
and then compare it to all strings in sentB
. 我需要在sentA
列中引用一个字符串,然后将其与sentB
所有字符串进行sentB
。 The following example shows the dataframe I defined as questions
. 以下示例显示了我定义为questions
的数据框。
sentA sentB
str1 str1
str2 str2
str3
The code I'm currently using can only compare even columns and looks like this: 我当前使用的代码只能比较偶数列,看起来像这样:
def compare(row):
sentA = row[0]
return pd.Series([simalarity_funct(sentA, sentB) for sentB in questions['sentB']])
results = questions.apply(compare, axis=1).T
That code gives me 3 outputs for str1A (similarity to str1B, str2B, and str3B) and puts them in a column. 该代码为我提供了str1A的3个输出(类似于str1B,str2B和str3B),并将它们放在一列中。
Here is another example with simplified code based on the input df numbers : 这是另一个基于输入df编号的简化代码示例:
num1 num2
3 5
4 6
7
def multiply(num1, num2):
return num1*num2
def compare(row):
num1 = row[0]
# I would like to prevent this next statement from passing an "NaN" to the
# multiply function. The empty cells will always be at the end of the column.
return pd.Series([multiply(num1, num2) for num2 in numbers['num2']])
results = numbers.apply(compare, axis=1).T
print(results)
15 20 NaN
18 24 NaN
21 28 NaN
The underlying problem is that my similarity function will throw an error if it is fed bad data. 潜在的问题是,如果相似功能输入错误的数据,它将引发错误。 The easiest way I can think of fixing this is by not feeding it bad data. 我想解决此问题的最简单方法是不向其提供错误的数据。 Is there a way I can modify the last step to prevent it from passing "NaN" to the similarity function? 有什么方法可以修改最后一步,以防止将“ NaN”传递给相似性函数?
def compare(row):
num1 = row[0]
pd.Series([multiply(num1, num2) for num2 in numbers[numbers.num2.notnull()].num2 ])
numbers[numbers.num1.notnull()].apply(compare, axis=1).T
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.