简体   繁体   English

创建一个以特定字符串 pandas 开头的循环

[英]Create a loop startswith specific string pandas

I'm still a python beginner.我还是个 python 初学者。 I would like to extract only records starts with specific prefix like 'Wrong Data' for 'Specific Group' from df:我只想从 df 中提取以特定前缀开头的记录,例如“特定组”的“错误数据”:

在此处输入图像描述

I'm trying to create a loop, please see below:我正在尝试创建一个循环,请参见下文:

names_list = []
for name in df['short_desc']:
    if 'Specifc Group' in df['group']:
        if name.startswith("Wrong Data"):
            names_list.append(name)

But this loop doesn't extract what I would like to have.但是这个循环并没有提取我想要的东西。 I'm not sure what went wrong.我不确定出了什么问题。 Could you please help?能否请你帮忙?

The cool thing about pandas is that you don't have to do these things in a loop. pandas 的妙处在于您不必在循环中执行这些操作。

import pandas as pd
data = [
    ['Closed', 'j.snow', 'Wrong Data.  Contact your admin', 'Specific Group'],
    ['Closed', 'j.doe', 'General Issue', 'Master Group'],
    ['Closed', 'j.snow', 'Wrong Data.  Contact your admin', 'Specific Group'],
    ['Closed', 'm.smith', 'Wrong Data.  Contact your admin', 'Specific Group'],
    ['Closed', 'a.richards', 'Wrong Data.  Contact your admin', 'Specific Group'],
    ['Closed', 'a.blecha', 'General Issue', 'Master Group'],
    ['Closed', 'r.kipling', 'Wrong Data.  Contact your admin', 'First Group']
]

df = pd.DataFrame(data, columns=['status', 'created', 'short_desc', 'group'])
print(df)
# Pick only those rows where short_desc starts with "Wrong".
df1 = df[df['short_desc'].str.startswith('Wrong')]
# Pick only those rows where group is "Specific Group".
df1 = df1[df1['group']=='Specific Group'] 
# Print the "short_desc" column.
print(df1['short_desc'])

Or, in a single line:或者,在一行中:

df1 = df[
        (df['short_desc'].str.startswith('Wrong')) &
        (df['group']=='Specific Group')
    ] 

This is pandas' "magic indexing".这是熊猫的“魔法索引”。 Those comparison operators return an array of booleans, True where the condition is true.这些比较运算符返回一个布尔数组,条件为真时为真。 When passing that to df[...] , that returns only the rows where the array element is True.将其传递给df[...]时,它仅返回数组元素为 True 的行。

You need to use .str.startswith to find rows where a column starts with a particular value:您需要使用.str.startswith来查找列以特定值开头的行:

subset = df[df['short_desc'].str.startswith('Wrong Data') & df['group'].eq('Specific Group')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM