[英]str object has no attribute str
I am trying to implement a function which does the following:我正在尝试实现一个 function,它执行以下操作:
The goal is to create a new column that contains the matches to the patterns provided, ie vt, r5, vt_r1, r5/r6.目标是创建一个新列,其中包含与所提供模式的匹配项,即 vt、r5、vt_r1、r5/r6。
Input Dataframe输入 Dataframe
col1 col2 col3 col4 input_str
a . . . disvt
b . . . disr5
c . . . disvt_r1
d . . . disr5/r6
def parse_info(input_str):
patterns = ["r\d{1}", "vt", "v\d{2}", "v\d{1}"]
new_list = []
for pattern in patterns:
if input_str.contains(pattern):
new_list.append(input_str.extract(pat=pattern, expand=False))
if len(new_list) == 0:
return np.nan
else:
return "_".join(new_list)
Applying the function to create a new column:应用 function 创建一个新列:
df["new_column"] = df.apply(
lambda x: x(df["input_str"]), axis=1
)
Desired output:所需的 output:
input_str new_column
disvt vt
disr5 r5
disvt_r1 vt_r1
r5/r6 r5_r6
This returns the following error: `str' object has no attribute contains这将返回以下错误:`str' object 没有属性包含
When I change.contains to.str.contains() I now get 'str' object has no attribute 'str'当我 change.contains to.str.contains() 我现在得到 'str' object has no attribute 'str'
I am a bit stuck at this point and not sure the best way to resolve these problems.我在这一点上有点卡住了,不确定解决这些问题的最佳方法。
EDIT (after updated question with input and expected output):编辑(在更新问题后输入和预期输出):
You can simply use str.extract()
, but you need to fix your regex patterns.您可以简单地使用
str.extract()
,但您需要修复您的正则表达式模式。 The key thing is to join
the different patterns into a string separated by the or operator |
关键是
join
不同的模式连接成一个由 or 运算符|
分隔的字符串。 and include inside of a capture group between two parentheses:并在两个括号之间包含一个捕获组:
patterns = ["r\d{1}", "vt", "v\d{2}", "v\d{1}"]
df['new_column'] = df['input_str'].str.extract('(' + '|'.join(patterns) + ')')
df
Out[1]:
col1 col2 col3 col4 input_str new_column
0 a . . . disvt vt
1 b . . . disr5 r5
2 c . . . disvt_r1 vt
3 d . . . disr5/r6 r5
The method str.contains
is only for a pandas.Series
.方法
str.contains
仅适用于pandas.Series
。 You should use in
for a normal string as follows:您应该将
in
用于普通字符串,如下所示:
if input_str in pattern:
instead of代替
if input_str.contains(pattern):
Likewise, the method str.extract
is only for a pandas.Series
.同样,方法
str.extract
仅适用于pandas.Series
。 You can try re.match
, re.findall
, list comprehension or other alternatives that work on normal python strings.您可以尝试
re.match
、 re.findall
、列表理解或其他适用于普通 python 字符串的替代方法。
Instead of creating a list of patterns, you can use a single pattern using str.findall and join the results using apply您可以使用str.findall使用单个模式并使用apply加入结果,而不是创建模式列表
Pattern:图案:
v(?:t|\d{1,2})|r\d
For example例如
import pandas as pd
items= [
"disvt",
"disr5",
"disvt_r1",
"disr5/r6"
]
df = pd.DataFrame(items, columns=["input_str"])
df['new_column'] = df['input_str'].str.findall(r"v(?:t|\d{1,2})|r\d").apply('_'.join)
print(df)
Output Output
input_str new_column
0 disvt vt
1 disr5 r5
2 disvt_r1 vt_r1
3 disr5/r6 r5_r6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.