[英]Pandas- How to get number of times row occurs for each corresponding value in another column
I have quite a complicated problem that I need help figuring out. 我有一个非常复杂的问题,我需要帮助搞清楚。
To begin, I have a dataframe: 首先,我有一个数据帧:
one two three four Date
comedy a asad 123 2013-01-18 10:00:00
romantic b fas 563 2015-01-28 12:00:00
comedy c ewf 134 2014-01-22 09:00:00
action a qef 561 2013-02-18 18:00:00
action z adwq 1323 2016-01-23 16:00:00
...
I am trying to find the best way to count the number of occurrences(frequency) for each unique value in column 'one', for each week in the data column. 我试图找到计算数据列中每周的“一”列中每个唯一值的出现次数(频率)的最佳方法。 I then want to be able to somehow compare does a higher frequency of each occurrences in each week, result in a higher or lower number for column 'four'.
然后,我希望能够以某种方式比较每周中每次出现的频率是否更高,导致列'4'的数字更高或更低。
My desired output is something like this, but I am open to better solutions: 我想要的输出是这样的,但我愿意接受更好的解决方案:
ones 2013-01-00 2013-01-07 2013-01-14..... Total_frequency
comedy 4 5 6 15
romantic 1 2 0 3
action 0 0 0 0
....
Each unique value from column 'one' is under 'ones', and their total number of occurrences for each week is under each week column. 列'one'中的每个唯一值都在'ones'下,并且每周的总出现次数在每周列之下。 (The week columns will begin at a specified week (eg in the above case -> 2013-01-00).
(周列将在指定的一周开始(例如,在上述情况下 - > 2013-01-00)。
Although, I am having trouble trying to think of the best way to relate the total frequency to column four across the dataframe. 虽然,我在尝试考虑将整个频率与数据帧中的第4列相关联的最佳方法时遇到了麻烦。
If anyone has any idea of the best way I could go about doing this, it'd be very much appreciated. 如果有人知道我可以做到这一点的最佳方式,我们将非常感激。
If you need anymore information please let me know. 如果您需要更多信息,请告诉我。
Edit: 编辑:
ones 2013-01-00 2013-01-07 2013-01-14..... Total_frequency
comedy 4 5 6 15
romantic 1 2 0 3
action NaN 1 0 1
Thanks. 谢谢。
Use: 采用:
#changed data sample for better verify output
print (df)
one two three four Date
0 comedy a asad 123 2013-01-18 10:00:00
1 romantic b fas 563 2013-01-28 12:00:00
2 comedy c ewf 134 2013-01-22 09:00:00
3 action a qef 561 2013-02-18 18:00:00
4 action z adwq 1323 2013-01-23 16:00:00
Use Grouper
with DataFrameGroupBy.size
and unstack
: 使用
Grouper
与DataFrameGroupBy.size
和unstack
:
df['Date'] = pd.to_datetime(df['Date'])
df = (df.groupby(['one',pd.Grouper(freq='W-MON', key='Date')])
.size()
.unstack(fill_value=0)
.sort_index(axis=1))
df.columns = df.columns.date
df['Total_frequency'] = df.sum(axis=1)
print (df)
2013-01-21 2013-01-28 2013-02-18 Total_frequency
one
action 0 1 1 2
comedy 1 1 0 2
romantic 0 1 0 1
EDIT: Create boolean mask for values after first NaN
s and replace missing values only by this mask: 编辑:在第一个
NaN
之后为值创建布尔掩码,并仅通过此掩码替换缺失值:
print (df)
2013-01-00 2013-01-07 2013-01-14
ones
comedy 4.0 5 6.0
romantic 1.0 2 NaN
action NaN 1 NaN
mask = df.notnull().cumsum(axis=1).ne(0)
#another solution
#mask = df.ffill(axis=1).notnull()
df = df.mask(mask, df.fillna(0))
print (df)
2013-01-00 2013-01-07 2013-01-14
ones
comedy 4.0 5 6.0
romantic 1.0 2 0.0
action NaN 1 0.0
You could try this: 你可以试试这个:
df = pd.DataFrame({'one': [random.choice(['comedy', 'action', 'romantic']) for i in range(1000)],
'Date': pd.date_range(start = '2013-01-01', periods = 1000)})
df.head()
one Date
0 romantic 2013-01-01
1 romantic 2013-01-02
2 romantic 2013-01-03
3 action 2013-01-04
4 romantic 2013-01-05
df.groupby([pd.Grouper(key = 'Date', freq = 'W'), 'one'])['one'].count().unstack(level = 0)
Date 2013-01-06 2013-01-13 2013-01-20.....
one
comedy 2 2 2
romantic NaN 2 2
action 4 3 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.