简体   繁体   English

有没有更有效的方法来聚合数据集并在 Python 或 R 中计算频率?

[英]is there a more efficient way to aggregate a dataset and calculate frequency in Python or R?

i have a dataset [0, 1, 1, 2], I want to aggregate it.我有一个数据集 [0, 1, 1, 2],我想聚合它。 to do this, I have to compute and put the 'frequency':1/4 manually into a DataFrame.为此,我必须手动计算并将“频率”:1/4 放入 DataFrame 中。 here is the code.这是代码。

>>> df = pd.DataFrame({'value':[0, 1, 1, 2],
...             'frequency':1/4})
>>> df.groupby('value').sum()
       frequency
value           
0           0.25
1           0.50
2           0.25

is there a more efficient way to aggregate the dataset and calculate the frequency automatically in Python or R?有没有更有效的方法来聚合数据集并在 Python 或 R 中自动计算频率?

df['value'].value_counts(normalize=True,sort=False)

Maybe you could try this... 也许你可以试试这个...

Reference:- 参考:-

  1. pandas.Series.value_counts() pandas.Series.value_counts()

In R 在R中

prop.table(table(dat$value))

   0    1    2 
0.25 0.50 0.25 

In python, NumPy 在python中,NumPy

import numpy as np 
u,c=np.unique(df.value,return_counts=True)
pd.Series(c/c.sum(),index=u)
0    0.25
1    0.50
2    0.25
dtype: float64

In R you could do something like R您可以执行以下操作

library(data.table)
dt <- data.table(sample(0:2,100,replace=TRUE))
dt[,.N/nrow(dt),V1]

## > dt[,.N/nrow(dt),V1]

##    V1   V1
## 1:  1 0.33
## 2:  2 0.32
## 3:  0 0.35

without using pandas you could use Counter 不使用熊猫就可以使用Counter

from collections import Counter
z = [0,1,1,2]
Counter(z)
Counter({1: 2, 0: 1, 2: 1})

and then to a dataframe 然后到一个数据框

x = Counter(z)
df = pd.DataFrame.from_dict(x, orient='index').reset_index()

and then take the values divided by 4 (your desired Freq) 然后将值除以4(您所需的频率)

import pandas as pd
pd.Series([0, 1, 1, 2]).value_counts(normalize=True, sort=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 最有效的方法来计算Python列表中的值的频率? - Most Efficient way to calculate Frequency of values in a Python list? 有没有比python中的networkx更有效的计算最短路径问题的方法? - Is there a more efficient way to calculate the shortest path problem than networkx in python? 更有效的方法来计算Python中大型列表的标准差 - More efficient way to calculate standard deviation of a large list in Python 比使用列表更有效地构建数据集的方法 - More efficient way to build dataset then using lists 更有效的计算numpy距离的方法? - more efficient way to calculate distance in numpy? 将大型R数据集导出到excel的有效方法 - Efficient way of exporting large R dataset to excel Python 3 中的这些 for 循环是否有更有效的方法? - Is there a more efficient way to these for loops in Python 3? 有没有更有效的方法来做到这一点?[python] - Is there a more efficient way to do this?[python] Python:为此做的更有效的方法 - Python : more efficient way to do for this Python的difflib中的SequenceMatcher是否可以提供更有效的方法来计算Levenshtein距离? - Is it possible that the SequenceMatcher in Python's difflib could provide a more efficient way to calculate Levenshtein distance?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM