简体   繁体   中英

Change pandas data frame column values inplace

I have a pandas data frame.

keyword                     adGroup     goal6Value   adCost
aaaa                        (not set)   0            0.0
+bbbb                       (not set)   0            0.0
+cccc                       (not set)   2072         0.0
dddd                        (not set)   0            0.0

I changed the values in the first column, to add brackets to the keywords based on some conditions (if there's no "+" symbol, add brackets).

keyword                     adGroup     goal6Value   adCost
[aaaa]                      (not set)   0            0.0
+bbbb                       (not set)   0            0.0
+cccc                       (not set)   2072         0.0
[dddd]                      (not set)   0            0.0

This is the function created to add bracket:

def add_bracket(df):

    df["keyword"] = df["keyword"].astype('str')
    keyword_list = list()

    for index, row in df.iterrows():
       keyword = row["keyword"]
       if keyword.find("+") < 0:
         keyword = "[" + keyword + "]"
       keyword_list.append(keyword)

    kw = pd.DataFrame(keyword_list, columns = ['Keyword2'])
    df2 = pd.concat([df, kw], axis=1).drop(columns["keyword"]).rename(columns={'Keyword2': 'keyword'})
    df2 = df2[['keyword', 'adGroup', 'goal6Value', 'adCost']]
    return df2

The function produced the result I want, but is there a neater way in pandas so that I don't need to create df2 to add the output of column 1 (basically doing the changes inplace)?

Solution: Based on @Inder's suggested answer, this whole function can be written in one line.

df["keyword"] = df.keyword.apply(lambda x: "[" + x + "]" if x.find("+") < 0 else x)

Based on @RafaelC's answer.

mask = df.keyword.str.contains('+', regex=False)
df.loc[~mask, 'keyword'] = "[" + df.loc[~mask, 'keyword'] + "]"

Just sum

mask = df.keyword.str.contains('+', regex=False)
df.loc[~mask, 'keyword'] = "[" + df.loc[~mask, 'keyword'] + "]"

    keyword 
0   [aaaa]  
1   [bbbb]  
2   [cccc]  
3   [dddd]  

Why is this better than apply ?

Take a look at the timings :

%timeit "[" + df.loc[mask, 'keyword'] + "]"
348 µs ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df.keyword.apply(lambda x:[x])
112 µs ± 3.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Whoa, so apply is faster?

Not quite. Maybe in a very very small df , but take a look at the same operation on a bigger df with 100,000 times more rows :

df = pd.concat([df]*100000)

%timeit "[" + df.loc[mask, 'keyword'] + "]"
4.54 ms ± 135 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.keyword.apply(lambda x:[x])
129 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So apply gets very very slow very fast, but vectorized operations don't

you can use apply for this purpose :

df["keyword"]=df.keyword.apply(lambda x:[x])

so its dataframe.name_of_column.apply("operation")

the output will be :

keyword                     adGroup     goal6Value   adCost
[aaaa]                      (not set)   0            0.0
[bbbb]                      (not set)   0            0.0
[cccc]                      (not set)   2072         0.0
[dddd]                      (not set)   0            0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM