str object 没有属性 str

Question

I am trying to implement a function which does the following:我正在尝试实现一个 function，它执行以下操作：

Iterates through a column - df['input_str'] which contains strings such as 'disvt', disr5', 'disvt_r1', 'disr5/r6'遍历列 - df['input_str']，其中包含诸如 'disvt'、disr5'、'disvt_r1'、'disr5/r6' 之类的字符串
If a string contains the pattern, then using.extract(), extract the pattern and append it to a list.如果字符串包含模式，则使用 .extract() 将模式和 append 提取到列表中。
If the list has no length, return 0.如果列表没有长度，则返回 0。
Otherwise connect the items in the list with _否则用 _ 连接列表中的项目

The goal is to create a new column that contains the matches to the patterns provided, ie vt, r5, vt_r1, r5/r6.目标是创建一个新列，其中包含与所提供模式的匹配项，即 vt、r5、vt_r1、r5/r6。

Input Dataframe输入 Dataframe

col1   col2  col3  col4   input_str    
  a      .     .     .       disvt          
  b      .     .     .       disr5          
  c      .     .     .       disvt_r1        
  d      .     .     .       disr5/r6

def parse_info(input_str):
    patterns = ["r\d{1}", "vt", "v\d{2}", "v\d{1}"]
    new_list = []
    for pattern in patterns:
        if input_str.contains(pattern):
            new_list.append(input_str.extract(pat=pattern, expand=False))
    if len(new_list) == 0:
        return np.nan
    else:
        return "_".join(new_list)

Applying the function to create a new column:应用 function 创建一个新列：

df["new_column"] = df.apply(
    lambda x: x(df["input_str"]), axis=1
)

Desired output:所需的 output：

input_str    new_column
disvt           vt
disr5           r5
disvt_r1        vt_r1
r5/r6           r5_r6

This returns the following error: `str' object has no attribute contains这将返回以下错误：`str' object 没有属性包含

When I change.contains to.str.contains() I now get 'str' object has no attribute 'str'当我 change.contains to.str.contains() 我现在得到 'str' object has no attribute 'str'

I am a bit stuck at this point and not sure the best way to resolve these problems.我在这一点上有点卡住了，不确定解决这些问题的最佳方法。

Answer 1

EDIT (after updated question with input and expected output):编辑（在更新问题后输入和预期输出）：

You can simply use str.extract() , but you need to fix your regex patterns.您可以简单地使用str.extract() ，但您需要修复您的正则表达式模式。 The key thing is to join the different patterns into a string separated by the or operator |关键是join不同的模式连接成一个由 or 运算符|分隔的字符串。 and include inside of a capture group between two parentheses:并在两个括号之间包含一个捕获组：

patterns = ["r\d{1}", "vt", "v\d{2}", "v\d{1}"]
df['new_column'] = df['input_str'].str.extract('(' + '|'.join(patterns) + ')')
df
Out[1]: 
  col1 col2 col3 col4 input_str new_column
0    a    .    .    .     disvt         vt
1    b    .    .    .     disr5         r5
2    c    .    .    .  disvt_r1         vt
3    d    .    .    .  disr5/r6         r5

The method str.contains is only for a pandas.Series .方法str.contains仅适用于pandas.Series 。 You should use in for a normal string as follows:您应该将in用于普通字符串，如下所示：

if input_str in pattern:

instead of代替

if input_str.contains(pattern):

Likewise, the method str.extract is only for a pandas.Series .同样，方法str.extract仅适用于pandas.Series 。 You can try re.match , re.findall , list comprehension or other alternatives that work on normal python strings.您可以尝试re.match 、 re.findall 、列表理解或其他适用于普通 python 字符串的替代方法。

Answer 2

Instead of creating a list of patterns, you can use a single pattern using str.findall and join the results using apply您可以使用str.findall使用单个模式并使用apply加入结果，而不是创建模式列表

Pattern:图案：

v(?:t|\d{1,2})|r\d

Regex demo正则表达式演示

For example例如

import pandas as pd

items= [
    "disvt",
    "disr5",
    "disvt_r1",
    "disr5/r6"
]

df = pd.DataFrame(items, columns=["input_str"])

df['new_column'] = df['input_str'].str.findall(r"v(?:t|\d{1,2})|r\d").apply('_'.join)
print(df)

Output Output

  input_str new_column
0     disvt         vt
1     disr5         r5
2  disvt_r1      vt_r1
3  disr5/r6      r5_r6

str object 没有属性 str

问题描述

2 个解决方案

解决方案1
2 2020-12-02 18:56:47

解决方案2
0 2020-12-03 11:17:52

str object 没有属性 str

问题描述

2 个解决方案

解决方案1 2 2020-12-02 18:56:47

解决方案2 0 2020-12-03 11:17:52

解决方案1
2 2020-12-02 18:56:47

解决方案2
0 2020-12-03 11:17:52