[英]applying a string function to a pandas dataframe column
this seems somewhat basic but after going through stackoverflow I couldn't seem to take everything answered and solve my problem. 这似乎有些基本,但是经过stackoverflow之后,我似乎无法回答所有问题并解决我的问题。 so i'm working on my text processing skills. 所以我正在研究我的文本处理技能。 I put car reviews in a pandas dataframe looking like this: 我将汽车评论放在熊猫数据框中,如下所示:
Review
0 :P I like you, Merc. You make me laugh! If Mat...
1 I am surprised that I did not find any discuss...
3 . . .let me see if I am following along correc...
4 . . .now hold on a minute. A "current" A6 4.2 ...
5 but has anyone noticed the front oh the new ac...
i wrote a function that takes a string as input and returns a value (in my case a sentiment score). 我编写了一个函数,该函数将字符串作为输入并返回一个值(在我的情况下为情感分数)。 within my function, this value will be put in a newly created column. 在我的函数中,此值将放在新创建的列中。 the problem I obviously keep coming across is with the input - i get an expected string error. 我显然一直遇到的问题是输入-我收到了预期的字符串错误。 with a dataframe, there are objects not strings. 带有数据框的对象不是字符串。
the function is very long and works when a string is inputed. 该函数很长,可以在输入字符串时使用。 here's a snippet of the function: note that dataframe is titled edmunds. 这是该函数的代码段:请注意,数据框的名称为edmunds。
def checker(b):
word = 'ls'
if stry.find(word) == -1:
edmunds['ls'] = 0.0
...
edmunds['ls'] = sum(o_list)
any help would be greatly appreciated. 任何帮助将不胜感激。 trying to wrap my head around if i should go from dataframe to list or if i could still work within pandas. 如果我应该从数据框转到列表还是我仍然可以在熊猫中工作,请尝试绕开我的头。
output would ideally look like: 理想情况下,输出如下所示:
Review ls
0 :P I like you, Merc. You make me laugh! If Mat... 0.4
1 I am surprised that I did not find any discuss... 0.5
3 . . .let me see if I am following along correc... 0.0
4 . . .now hold on a minute. A "current" A6 4.2 ... 1.0
5 but has anyone noticed the front oh the new ac... -0.6
To create a column ls
from the column Review
: 要从列Review
创建列ls
:
You need a function that takes one string and returns one number. 您需要一个采用一个字符串并返回一个数字的函数。 Currently, the checker
function doesn't do that, since it doesn't return anything (and also sets values inside edmunds
, which is not necessary). 当前, checker
函数不执行此操作,因为它不返回任何内容(并且也可以在edmunds
设置值,这是不必要的)。 Given how the function looks, you will probably want to end the function with a return sum(o_list)
. 给定函数的外观,您可能希望以return sum(o_list)
结束函数。 And, to be clear, the input b
must be a single string. 而且,需要明确的是,输入b
必须是单个字符串。 One way to test this: you should be able to write checker("hello")
and get a number back. 一种测试方法:您应该能够写出checker("hello")
并取回一个数字。
If you've written checker
that way, then you can create ls
easily: edmunds['ls'] = edmunds.Review.apply(checker)
如果您以这种方式编写了checker
,则可以轻松创建ls
: edmunds['ls'] = edmunds.Review.apply(checker)
In case you want to get more background on why Pandas works this way: The apply
function is an example of mapping (a function over a list). 如果您想了解更多有关Pandas为何如此工作的背景信息: apply
函数是映射的示例(列表中的函数)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.