简体   繁体   English

Pandas str。包含正则表达式

[英]Pandas str.contains regex

I am trying to learn Pandas and MatPlotLib. 我正在尝试学习Pandas和MatPlotLib。 As a challenge I decided it would be fun to try and graph the results of profession type based on the comments. 作为一个挑战,我认为尝试根据评论对职业类型的结果进行图形绘制会很有趣。 My thought process is to get comments, find a small dataset of professions, and check the comment against the dataset. 我的思考过程是获取评论,找到一个小的专业数据集,然后根据数据集检查评论。 I'm sure there has to be a better way, still learning. 我敢肯定,还有一种更好的方法,仍然可以学习。

Is there a difference in how Pandas regex matches compared to regular regex results? 与常规正则表达式结果相比,Pandas正则表达式的匹配方式有何不同? 0 should be true should it not? 0应该是真的吗?

#! /usr/bin/python
from __future__ import print_function
from __future__ import division
from __future__ import absolute_import
import pandas as pd
import matplotlib.pyplot as plt
import praw

r = praw.Reddit(user_agent='my_cool_application')
submissions = r.get_submission(submission_id = '2owaba')
s = pd.Series(submissions.comments)

pattern = r'Programmer'
print (s.str.contains(pattern))
print (s)

Output is not as expected. 输出不符合预期。

$ python reddit.py 
0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN
6    NaN
7    NaN
8    NaN
9    NaN
10   NaN
11   NaN
12   NaN
13   NaN
14   NaN
...
57   NaN
58   NaN
59   NaN
60   NaN
61   NaN
62   NaN
63   NaN
64   NaN
65   NaN
66   NaN
67   NaN
68   NaN
69   NaN
70   NaN
71   NaN
Length: 72, dtype: float64
0     Programmer/Project Lead for a railroad company...
1     I deliver pizza part time while I go to colleg...
2      Graduate student (molecular biologist) + cat mom
3     Systems Analyst at a big, boring corporation. ...
4     I work in IT.  I wear many hats at my (small) ...
5                       I'm a professional desk jobber.
6     medical pot producer....pretty much your typic...
7     Research tech for the federal govt. Water leve...
8                                     Karate instructor
9     I own a Vape shop and an E-Liquid manufacturin...
10      Guidance counselor. If only my students knew...
11                         Graduate student and chemist
12    Regulatory Affairs for a medical device manufa...
13    restaurant manager (for the moment, looking to...
14    Logistics and technician manager for a radon m...
...
57    Technical Support for a big credit card proces...
58    Class action settlement administration. Been t...
59    IT Consultant here 8) Lot's of IT folk at EF i...
60    This'll be my first year, staying in the Back ...
61    Research assistant in the epidemiology departm...
62    IT undergrad and this will be my second time a...
63    Commercial construction foreman at a tiny company
64    I'm actually a web developer for a company tha...
65             Install cameras, tv's and phone systems.
66    Animation/design/anything creative. Graduated ...
67                                     Career bartender
68    I work in the Traveling Hospitality Business f...
69    Assisstant Manager at a major retail chain...t...
70                                          Barista :) 
71    Hi, I'm Pasquale Rotella (CEO, Insomniac Event...
Length: 72, dtype: object

Your series contains praw.objects.Comment objects not strings. 您的系列包含praw.objects.Comment对象,而不是字符串。 Extracting body should give you what you want: 提取身体应该给您您想要的:

s = pd.Series(comment.body for comment in submissions.comments)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM