简体   繁体   English

将系列有效整合到熊猫数据框中

[英]Efficiently integrate a series into a pandas dataframe

I have a pandas dataframe with index [0, 1, 2...] , and a list something like this: [1, 2, 2, 0, 1...] . 我有一个索引为[0, 1, 2...]的pandas数据帧,并且列表如下: [1, 2, 2, 0, 1...]

I'd like to add a 'count' column to the dataframe, that reflects the number of times the digit in the index is referenced in the list. 我想在数据框中添加一个“计数”列,以反映索引中的数字在列表中被引用的次数。

Given the example lists above, the 'count' column would have the value 2 at index 2 , because 2 occurred twice (so far). 鉴于上面的例子中列出的“计数”列将具有值2索引2 ,因为2次发生(到目前为止)。 Is there a more efficient way to do this than iterating over the list? 有比遍历列表更有效的方法吗?

Well here is a way of doing it, first load the list into a df, then add the 'occurrence' column using value_counts and then merge this to your orig df: 好了,这是一种方法,首先将列表加载到df中,然后使用value_counts添加'occurrence'列,然后将其merge到您的orig df中:

In [61]:
df = pd.DataFrame({'a':np.arange(10)})
l=[1,2,2,0,1]
df1 = pd.DataFrame(l, columns=['data'])
df1['occurence'] = df1['data'].map(df1['data'].value_counts())
df1

Out[61]:
   data  occurence
0     1          2
1     2          2
2     2          2
3     0          1
4     1          2

In [65]:
df.merge(s, left_index=True, right_on='data',how='left').fillna(0).drop_duplicates().reset_index(drop=True)

Out[65]:
   a  data  count
0  0     0      1
1  1     1      2
2  2     2      2
3  3     3      0
4  4     4      0
5  5     5      0
6  6     6      0
7  7     7      0
8  8     8      0
9  9     9      0

Counting occurences of numbers in a dataframe is easy in pandas 在熊猫中,统计数据框中数字出现的次数很容易

You just use the Series.value_counts method. 您只需使用Series.value_counts方法。

Then you join the grouped dataframe with the original one using the pandas.merge function. 然后,使用pandas.merge函数将分组的数据与原始数据框合并

Setting up a DataFrame like the one you have: 像您一样设置一个DataFrame:

df = pd.DataFrame({'nomnom':np.random.choice(['cookies', 'biscuits', 'cake', 'lie'], 10)})

df is now a DataFrame with some arbitrary data in it (since you said you had more data in there). df现在是一个其中包含一些任意数据的DataFrame(因为您说过那里有更多数据)。

     nomnom
0  biscuits
1       lie
2  biscuits
3      cake
4       lie
5   cookies
6      cake
7      cake
8      cake
9      cake

Setting up a list like the one you have: 设置类似您的清单:

yourlist = np.random.choice(10, 10)

yourlist is now: 您的清单现在是:

array([2, 9, 2, 3, 4, 8, 5, 8, 6, 8])

The actual code you need (TLDR;): 您需要的实际代码(TLDR;):

counts = pd.DataFrame(pd.value_counts(yourlist))
pd.merge(left=df, left_index=True,
         right=counts, right_index=True,
         how='left').fillna(0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM