简体   繁体   English

Python循环在dataframe的所有列中搜索多组关键字

[英]Python loop to search multiple sets of keywords in all columns of dataframe

I've used the code below to search across all columns of my dataframe to see if each row has the word "pool" and the words "slide" or "waterslide".我使用下面的代码搜索了 dataframe 的所有列,以查看每一行是否有单词“pool”和单词“slide”或“waterslide”。

AR11AR11_regex = r"""
(?=.*(?:slide|waterslide)).*pool
"""
f = lambda x: x.str.findall(AR_regex, flags= re.VERBOSE|re.IGNORECASE)
d['AR'][AR11] = d['AR'].astype(str).apply(f).any(1).astype(int)

This has worked fine but when I want to write a for loop to do this for more than one regex pattern (eg, AR11, AR12, AR21) using the code below, the new columns are all zeros (ie, the search is not finding any hits)这工作得很好,但是当我想使用下面的代码编写一个 for 循环来为多个正则表达式模式(例如,AR11、AR12、AR21)执行此操作时,新列全为零(即,搜索未找到任何命中)

for i in AR_list:
    print(i)
    pat = i+"_regex"
    print(pat)
    f = lambda x: x.str.findall(i+"_regex", flags= re.VERBOSE|re.IGNORECASE)
    d['AR'][str(i)] = d['AR'].astype(str).apply(f).any(1).astype(int)

Any advice on why this loop didn't work would be much appreciated!任何关于为什么这个循环不起作用的建议将不胜感激!

A small sample data frame would help understand your question.一个小样本数据框将有助于理解您的问题。 In any case, your code sample appears to have a multitude of problems.无论如何,您的代码示例似乎存在许多问题。

  1. i+"_regex" is just the string "AR11_regex". i+"_regex" 只是字符串 "AR11_regex"。 It won't evaluate to the value of the variable with the identifier AR11_regex.它不会评估标识符为 AR11_regex 的变量的值。 Put your regex patterns in a dict.将您的正则表达式模式放入字典中。

  2. d['AR'] is the values in the AR column. d['AR'] 是 AR 列中的值。 It seems like you expect it to be a row.您似乎希望它是一排。

  3. d['AR'][str(i)] is adding a new row. d['AR'][str(i)] 正在添加新行。 It seems like you want to add a new column.您似乎想添加一个新列。

  4. Lastly, this approach to setting a cell generally (always for me) yields the following warning: /var/folders/zj/pnrcbb6n01z2qv1gmsk70b_m0000gn/T/ipykernel_13985/876572204.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame最后,这种设置单元格的方法(对我来说总是如此)会产生以下警告:/var/folders/zj/pnrcbb6n01z2qv1gmsk70b_m0000gn/T/ipykernel_13985/876572204.py:2: SettingWithCopyWarning: A value is trying to be set on a copy DataFrame 的切片

    See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy请参阅文档中的警告: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

The suggest approach would be to use "at" as in d.at[str(i), 'AR'] or some such.建议的方法是使用 d.at[str(i), 'AR'] 或类似的“at”。

Add a sample data frame and refine your question for more suggestions.添加示例数据框并细化您的问题以获得更多建议。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM