简体   繁体   English

系列正则表达式提取生成数据帧

[英]Series regex extract producing a dataframe

I am working through a regex task on Dataquest.我正在完成关于 Dataquest 的正则表达式任务。 The following code snippet runs correctly inside of the Dataquest IDE:以下代码片段在 Dataquest IDE 中正确运行:

titles = hn["title"]
pattern = r'\[(\w+)\]'
tag_matches = titles.str.extract(pattern)
tag_freq = tag_matches.value_counts()
print(tag_freq, '\n')

However, on my PC running pandas 0.25.3 this exact same code block yields an error:但是,在我运行 pandas 0.25.3 的 PC 上,这个完全相同的代码块会产生一个错误:

Traceback (most recent call last):
  File "C:/Users/Mark/PycharmProjects/main/main.py", line 63, in <module>
    tag_freq = tag_matches.value_counts()
  File "C:\Users\Mark\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\generic.py", line 5179, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'value_counts' 

Why is tag_matches coming back as a dataframe?为什么 tag_matches 作为数据帧返回? I am running an extract against the series 'titles'.我正在运行针对“标题”系列的摘录。

From the docs: Pandas.Series.str.Extract来自文档: Pandas.Series.str.Extract

A pattern with one group will return a Series if expand=False.如果 expand=False,则具有一组的模式将返回一个系列。

    >>> s.str.extract(r'[ab](\d)', expand=False)
0      1
1      2
2    NaN
dtype: object

So perhaps you must be explicit and set expand=False to get a series object?因此,也许您必须明确并设置 expand=False 才能获得系列对象?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM