简体   繁体   中英

Replace first n elements in pandas dataframe column

I want to replace the first n elements of a column in my data frame with another pd.series I have saved. So as an example,

        category   price    store  testscore
0       Cleaning   11.42  Walmart        NaN
1       Cleaning   23.50      Dia        NaN
2  Entertainment   19.99  Walmart        NaN
3  Entertainment   15.95     Fnac        NaN
4           Tech   55.75      Dia        NaN
5           Tech  111.55  Walmart        NaN

Here I would want to replace the first three NaNs in testscore with a new set of strings.

Imagine I have a variable:

cats = pd.Series(df['category'][0:2])

So can I place this in the testscore column...

        category   price    store      testscore
0       Cleaning   11.42  Walmart       Cleaning
1       Cleaning   23.50      Dia       Cleaning
2  Entertainment   19.99  Walmart  Entertainment
3  Entertainment   15.95     Fnac            NaN
4           Tech   55.75      Dia            NaN
5           Tech  111.55  Walmart            NaN

But whenever I try to do this it won't work.

Code to create this fake dataset:

import pandas as pd
import numpy as np

df = pd.DataFrame({'category': ['Cleaning', 'Cleaning', 'Entertainment', 'Entertainment', 'Tech', 'Tech'],
                        'store': ['Walmart', 'Dia', 'Walmart', 'Fnac', 'Dia','Walmart'],
                        'price':[11.42, 23.50, 19.99, 15.95, 55.75, 111.55],
                        'testscore': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})

print(df)

df2 = pd.DataFrame({'category': ['Cleaning', 'Cleaning', 'Entertainment', 'Entertainment', 'Tech', 'Tech'],
                        'store': ['Walmart', 'Dia', 'Walmart', 'Fnac', 'Dia','Walmart'],
                        'price':[11.42, 23.50, 19.99, 15.95, 55.75, 111.55],
                        'testscore': ['Cleaning', 'Cleaning', 'Entertainment', np.nan, np.nan, np.nan]})

print(df2)

Simply use df.loc :

import pandas as pd
import numpy as np

df = pd.DataFrame({'category': ['Cleaning', 'Cleaning', 'Entertainment', 'Entertainment', 'Tech', 'Tech'],
                        'store': ['Walmart', 'Dia', 'Walmart', 'Fnac', 'Dia','Walmart'],
                        'price':[11.42, 23.50, 19.99, 15.95, 55.75, 111.55],
                        'testscore': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})


cats = pd.Series(df['category'][:3]) # 3 elements

df.loc[:3,'testscore'] = cats # Assign first 3

print(df)

And you get:

        category   price    store      testscore
0       Cleaning   11.42  Walmart       Cleaning
1       Cleaning   23.50      Dia       Cleaning
2  Entertainment   19.99  Walmart  Entertainment
3  Entertainment   15.95     Fnac            NaN
4           Tech   55.75      Dia            NaN
5           Tech  111.55  Walmart            NaN

Use fillna with parameter limit :

df['testscore'] = df.testscore.fillna(df.category, limit=3)
df 

Output:

        category   price    store      testscore
0       Cleaning   11.42  Walmart       Cleaning
1       Cleaning   23.50      Dia       Cleaning
2  Entertainment   19.99  Walmart  Entertainment
3  Entertainment   15.95     Fnac            NaN
4           Tech   55.75      Dia            NaN
5           Tech  111.55  Walmart            NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM