I want to run a list comprehension to slice names by '-' in one column in a subset defined by values in other columns.
So in this case:
category product_type name
0 pc unit hero-dominator
1 print unit md-ffx605
2 pc option keyboard1.x-963
I'm interested in the 'pc' category and 'unit' product type, so I want the list comprehension to only change the first row of the 'name' column to this form:
category product_type name
0 pc unit dominator
1 print unit md-ffx605
2 pc option keyboard1.x-963
I tried this:
df['name'].loc[df['product_type']=='unit'] = [x.split('-')[1] for x in df['name'].loc[df['product_type']=='unit']]
But I'm getting the 'list index out of range' IndexError.
Any help much appreciated.
You can solve the problem the following way, please follow comments and feel free to ask questions:
Edit, now we consider that there could be not string elements in "name" column:
import pandas as pd
import numpy as np
def change(row):
if row["category"] == "pc" and row["product_type"] == "unit":
if type(row["name"]) is str: # check if element is string before split()
name_split = row["name"].split("-") # split element
if len(name_split) == 2: # it could be name which does not have "-" in it, check it here
return name_split[1] # if "-" was in name return second part of split result
return row["name"] # else return name without changes
return row["name"]
# create data frame:
df = pd.DataFrame(
{
"category": ["pc", "print", "pc", "pc", "pc", "pc"],
"product_type": ["unit", "unit", "option", "unit", "unit", "unit"],
"name": ["hero-dominator", "md-ffx605", "keyboard1.x-963", np.nan, 10.24, None]
}
)
df["name"] = df.apply(lambda row: change(row), axis=1) # change data frame here
print(df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.