简体   繁体   English

将字符串函数应用于pandas数据框列

[英]applying a string function to a pandas dataframe column

this seems somewhat basic but after going through stackoverflow I couldn't seem to take everything answered and solve my problem. 这似乎有些基本,但是经过stackoverflow之后,我似乎无法回答所有问题并解决我的问题。 so i'm working on my text processing skills. 所以我正在研究我的文本处理技能。 I put car reviews in a pandas dataframe looking like this: 我将汽车评论放在熊猫数据框中,如下所示:

    Review
0   :P I like you, Merc. You make me laugh! If Mat...
1   I am surprised that I did not find any discuss...
3   . . .let me see if I am following along correc...
4   . . .now hold on a minute. A "current" A6 4.2 ...
5   but has anyone noticed the front oh the new ac...

i wrote a function that takes a string as input and returns a value (in my case a sentiment score). 我编写了一个函数,该函数将字符串作为输入并返回一个值(在我的情况下为情感分数)。 within my function, this value will be put in a newly created column. 在我的函数中,此值将放在新创建的列中。 the problem I obviously keep coming across is with the input - i get an expected string error. 我显然一直遇到的问题是输入-我收到了预期的字符串错误。 with a dataframe, there are objects not strings. 带有数据框的对象不是字符串。

the function is very long and works when a string is inputed. 该函数很长,可以在输入字符串时使用。 here's a snippet of the function: note that dataframe is titled edmunds. 这是该函数的代码段:请注意,数据框的名称为edmunds。

def checker(b):
    word = 'ls'
    if stry.find(word) == -1:
        edmunds['ls'] = 0.0
    ...
    edmunds['ls'] = sum(o_list)

any help would be greatly appreciated. 任何帮助将不胜感激。 trying to wrap my head around if i should go from dataframe to list or if i could still work within pandas. 如果我应该从数据框转到列表还是我仍然可以在熊猫中工作,请尝试绕开我的头。

output would ideally look like: 理想情况下,输出如下所示:

Review                                                 ls
0   :P I like you, Merc. You make me laugh! If Mat...  0.4
1   I am surprised that I did not find any discuss...  0.5
3   . . .let me see if I am following along correc...  0.0
4   . . .now hold on a minute. A "current" A6 4.2 ...  1.0
5   but has anyone noticed the front oh the new ac...  -0.6

To create a column ls from the column Review : 要从列Review创建列ls

  1. You need a function that takes one string and returns one number. 您需要一个采用一个字符串并返回一个数字的函数。 Currently, the checker function doesn't do that, since it doesn't return anything (and also sets values inside edmunds , which is not necessary). 当前, checker函数不执行此操作,因为它不返回任何内容(并且也可以在edmunds设置值,这是不必要的)。 Given how the function looks, you will probably want to end the function with a return sum(o_list) . 给定函数的外观,您可能希望以return sum(o_list)结束函数。 And, to be clear, the input b must be a single string. 而且,需要明确的是,输入b必须是单个字符串。 One way to test this: you should be able to write checker("hello") and get a number back. 一种测试方法:您应该能够写出checker("hello")并取回一个数字。

  2. If you've written checker that way, then you can create ls easily: edmunds['ls'] = edmunds.Review.apply(checker) 如果您以这种方式编写了checker ,则可以轻松创建lsedmunds['ls'] = edmunds.Review.apply(checker)

In case you want to get more background on why Pandas works this way: The apply function is an example of mapping (a function over a list). 如果您想了解更多有关Pandas为何如此工作的背景信息: apply函数是映射的示例(列表中的函数)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM