简体   繁体   中英

Efficiently integrate a series into a pandas dataframe

I have a pandas dataframe with index [0, 1, 2...] , and a list something like this: [1, 2, 2, 0, 1...] .

I'd like to add a 'count' column to the dataframe, that reflects the number of times the digit in the index is referenced in the list.

Given the example lists above, the 'count' column would have the value 2 at index 2 , because 2 occurred twice (so far). Is there a more efficient way to do this than iterating over the list?

Well here is a way of doing it, first load the list into a df, then add the 'occurrence' column using value_counts and then merge this to your orig df:

In [61]:
df = pd.DataFrame({'a':np.arange(10)})
l=[1,2,2,0,1]
df1 = pd.DataFrame(l, columns=['data'])
df1['occurence'] = df1['data'].map(df1['data'].value_counts())
df1

Out[61]:
   data  occurence
0     1          2
1     2          2
2     2          2
3     0          1
4     1          2

In [65]:
df.merge(s, left_index=True, right_on='data',how='left').fillna(0).drop_duplicates().reset_index(drop=True)

Out[65]:
   a  data  count
0  0     0      1
1  1     1      2
2  2     2      2
3  3     3      0
4  4     4      0
5  5     5      0
6  6     6      0
7  7     7      0
8  8     8      0
9  9     9      0

Counting occurences of numbers in a dataframe is easy in pandas

You just use the Series.value_counts method.

Then you join the grouped dataframe with the original one using the pandas.merge function.

Setting up a DataFrame like the one you have:

df = pd.DataFrame({'nomnom':np.random.choice(['cookies', 'biscuits', 'cake', 'lie'], 10)})

df is now a DataFrame with some arbitrary data in it (since you said you had more data in there).

     nomnom
0  biscuits
1       lie
2  biscuits
3      cake
4       lie
5   cookies
6      cake
7      cake
8      cake
9      cake

Setting up a list like the one you have:

yourlist = np.random.choice(10, 10)

yourlist is now:

array([2, 9, 2, 3, 4, 8, 5, 8, 6, 8])

The actual code you need (TLDR;):

counts = pd.DataFrame(pd.value_counts(yourlist))
pd.merge(left=df, left_index=True,
         right=counts, right_index=True,
         how='left').fillna(0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM