[英]How to return a value based on column value and Timestamp using user-defined function in pandas
I have two dataframe, which I have joined.我有两个dataframe,我已经加入了。 On the joined Dataframe, I'm writing a user-defined function where based on Timestamp and the value count of the column i need to return the value based on the condition mentioned below create a new column called "Day_Sentiment".
在加入的 Dataframe 上,我正在编写一个用户定义的 function,其中基于时间戳和列的值计数,我需要根据下面提到的条件返回值,创建一个名为“Day_Sentiment”的新列。 But I'm getting below error.
但是我遇到了以下错误。 Please let me know how to go about it
请让我知道如何 go 关于它
Input:输入:
Date Content Cleaned-content Sentiment
11/12/2020 abb bbc abb Bad
12/10/2020 xyz xxy Good
11/24/2020 tyu yuu Neutral
12/16/2020 iop yui Bad
Output: Output:
Date Content Cleaned-content Sentiment Day_Sentiment
11/12/2020 abb bbc abb Bad Bad
12/10/2020 xyz xxy Good Bad
11/24/2020 tyu yuu Neutral Bad
12/16/2020 iop yui Bad Bad
So far I tried below:到目前为止,我在下面尝试过:
df = input_data.join(results)
def compare_def(df):
no.bad_senti= df.loc[df['Sentiment'] == 'Bad']
no.neut_senti = df.loc[df['Sentiment'] == 'Neutral']
no.good_senti= df.loc[df['Sentiment'] == 'Good']
if ((no.bad_senti> no.good_senti) & (no.bad_senti> no.neut_senti)):
output = 'Bad'
elif ((no.good_senti> no.bad_senti) & (no.good_senti> no.neut_senti)):
output= 'Good'
elif ((no.neut_senti> no.bad_senti) & (no.neut_senti> no.good_senti)):
output= 'Neutral'
elif no.good_senti== no.bad_senti:
output= 'Neutral'
elif no.bad_senti== no.neut_senti:
output= 'bad'
elif no.good_senti== no.neut_senti:
output= 'good'
else:
output= 'Neutral'
return output
df['Day_Sentiment'] = output
Alternate:备用:
output = compare_def(df)
df['Day_Sentiment'] = output
Error:错误:
ValueError: Can only compare identically-labeled DataFrame objects
Example 1: Predicted Sentiments Sentiment 2 bad 1 good 1 Neutral示例 1:预测情绪 情绪 2 坏 1 好 1 中性
Then in function 2 > 1 and 2 > 1 returns Bad然后在 function 2 > 1 和 2 > 1 返回 Bad
Example 2: Sentiment: 2 bad 5 good 5 neutral示例 2:情绪:2 坏 5 好 5 中性
Function: Function:
2 > 5 false 5 > 2 and 5 > 5 false 5 > 2 and 5 > 5 false 5==2 false 2==5 false 5==5 True return good 2 > 5 false 5 > 2 and 5 > 5 false 5 > 2 and 5 > 5 false 5==2 false 2==5 false 5==5 True 返回 good
There Several issues with your code.您的代码有几个问题。 To begin the variables bad, good, & neut are Panda Series of different lengths containing string variables.
首先,变量 bad、good 和 neut 是包含字符串变量的不同长度的熊猫系列。 You then attempt to evaluate perform several conditional tests for example
if ((bad> good) & (bad> neut)
which generates your ValueError. I am not quite sure what logic you are attempting to implement, but the following template may help:然后您尝试评估执行几个条件测试,例如
if ((bad> good) & (bad> neut)
会生成您的 ValueError。我不太确定您尝试实现的逻辑是什么,但以下模板可能会有所帮助:
def compare_data(row):
value = 'Good'
# The logic here escapes me
# Evaluate the row contents of row[Sentiment] and modify value
return value
df["Day Sentiment"]= df.apply(lambda row: compare_data(row), axis= 1)
Yields:产量:
Date Content Cleaned-content Sentiment Day Sentiment
0 11/12/2020 abb bbc abb Bad Good
1 12/10/2020 xyz xxy Good Good
2 11/24/2020 tyu yuu Neutral Good
3 12/16/2020 iop yui Bad Good
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.