简体   繁体   English

在 dataframe 的子集上使用列表理解进行切片

[英]Slicing using list comprehension on a subset of a dataframe

I want to run a list comprehension to slice names by '-' in one column in a subset defined by values in other columns.我想运行一个列表理解,以在由其他列中的值定义的子集中的一列中按“-”对名称进行切片。

So in this case:所以在这种情况下:

    category   product_type   name 
0   pc         unit           hero-dominator
1   print      unit           md-ffx605
2   pc         option         keyboard1.x-963

I'm interested in the 'pc' category and 'unit' product type, so I want the list comprehension to only change the first row of the 'name' column to this form:我对“pc”类别和“单元”产品类型感兴趣,所以我希望列表理解只将“名称”列的第一行更改为这种形式:

    category   product_type   name 
0   pc         unit           dominator
1   print      unit           md-ffx605
2   pc         option         keyboard1.x-963

I tried this:我试过这个:

df['name'].loc[df['product_type']=='unit'] = [x.split('-')[1] for x in df['name'].loc[df['product_type']=='unit']]

But I'm getting the 'list index out of range' IndexError.但我得到了“列表索引超出范围”IndexError。

Any help much appreciated.非常感谢任何帮助。

You can solve the problem the following way, please follow comments and feel free to ask questions:您可以通过以下方式解决问题,请关注评论并随时提出问题:

Edit, now we consider that there could be not string elements in "name" column:编辑,现在我们认为“名称”列中可能没有字符串元素:

import pandas as pd
import numpy as np


def change(row):
    if row["category"] == "pc" and row["product_type"] == "unit":
        if type(row["name"]) is str:  # check if element is string before split()
            name_split = row["name"].split("-")  # split element
            if len(name_split) == 2:  # it could be name which does not have "-" in it, check it here
                return name_split[1]  # if "-" was in name return second part of split result
            return row["name"]  # else return name without changes

    return row["name"]


# create data frame:
df = pd.DataFrame(
    {
        "category": ["pc", "print", "pc", "pc", "pc", "pc"],
        "product_type": ["unit", "unit", "option", "unit", "unit", "unit"],
        "name": ["hero-dominator", "md-ffx605", "keyboard1.x-963", np.nan, 10.24, None]
    }
)


df["name"] = df.apply(lambda row: change(row), axis=1)  # change data frame here
print(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM