简体   繁体   中英

Add a column in a pandas df with a repeated sequence of values

I want to add a column to a pandas DataFrame that has a sequence of int or even str.

This is the pandas DataFrame:

import pandas as pd

df = [{"us": "t1"},
{"us": "t2"},
{"us": "t3"},
{"us": "t4"},
{"us": "t5"},
{"us": "t6"},
{"us": "t7"},
{"us": "t8"},
{"us": "t9"},
{"us": "t10"},
{"us": "t11"},
{"us": "t12"}
    ]
df = pd.DataFrame(df)
df

I just want to add a column of a list of int or str like these:

list_int = [1, 2, 3, 4]

list_str = ['one','two','three','four']

Of course the code df['list_int']=list_int is not working because of the length.

The output should be this:

    us   list_int  list_str
0   t1      1        one
1   t2      2        two
2   t3      3        three
3   t4      4        four
4   t5      1        one
5   t6      2        two
6   t7      3        three
7   t8      4        four
8   t9      1        one
9   t10     2        two
10  t11     3        three
11  t12     4        four

You can use np.tile :

df['list_int'] = np.tile(list_int, len(df)//len(list_int) + 1)[:len(df)]

or simply

df['list_int'] = np.tile(list_int, len(df)//len(list_int)]

if len(df) is divisible by len(list_int) .

Let us do something new np.put

df['list_int']=''
df['list_str']=''
np.put(df.list_str,np.arange(len(df)),list_str)
np.put(df.list_int,np.arange(len(df)),list_int)


df
Out[83]: 
     us list_int list_str
0    t1        1      one
1    t2        2      two
2    t3        3    three
3    t4        4     four
4    t5        1      one
5    t6        2      two
6    t7        3    three
7    t8        4     four
8    t9        1      one
9   t10        2      two
10  t11        3    three
11  t12        4     four

multiplying lists with a number n repeats the list n times.

2 * ['one', 'two', 'three'] 

equals

['one', 'two', 'three', 'one', 'two', 'three']

so if you data frame is df you and your snippet is s=['one', 'two', 'three'] you can build your column like this:

col = (len(df) % len(s)) * s + s[:int(len(df) / len(s))]

Try this - might not be the best universal option but it works...

df["list_str"]=np.array(list_str)[df["us"].index.values%4]
df["list_int"]=np.array(list_int)[df["us"].index.values%4]

output


    us  list_str    list_int
0   t1  one         1
1   t2  two         2
2   t3  three       3
3   t4  four        4
4   t5  one         1
5   t6  two         2
6   t7  three       3
7   t8  four        4
8   t9  one         1
9   t10 two         2
10  t11 three       3
11  t12 four        4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM