简体   繁体   English

Pandas-如何获取另一列中每个对应值的行数

[英]Pandas- How to get number of times row occurs for each corresponding value in another column

I have quite a complicated problem that I need help figuring out. 我有一个非常复杂的问题,我需要帮助搞清楚。

To begin, I have a dataframe: 首先,我有一个数据帧:

 one       two     three     four      Date
comedy      a       asad      123      2013-01-18 10:00:00  
romantic    b       fas       563      2015-01-28 12:00:00
comedy      c       ewf       134      2014-01-22 09:00:00
action      a       qef       561      2013-02-18 18:00:00
action      z       adwq      1323     2016-01-23 16:00:00
...

I am trying to find the best way to count the number of occurrences(frequency) for each unique value in column 'one', for each week in the data column. 我试图找到计算数据列中每周的“一”列中每个唯一值的出现次数(频率)的最佳方法。 I then want to be able to somehow compare does a higher frequency of each occurrences in each week, result in a higher or lower number for column 'four'. 然后,我希望能够以某种方式比较每周中每次出现的频率是否更高,导致列'4'的数字更高或更低。

My desired output is something like this, but I am open to better solutions: 我想要的输出是这样的,但我愿意接受更好的解决方案:

 ones       2013-01-00  2013-01-07  2013-01-14.....    Total_frequency
 comedy         4          5           6                15
 romantic       1          2           0                3 
 action         0          0           0                0 
 ....

Each unique value from column 'one' is under 'ones', and their total number of occurrences for each week is under each week column. 列'one'中的每个唯一值都在'ones'下,并且每周的总出现次数在每周列之下。 (The week columns will begin at a specified week (eg in the above case -> 2013-01-00). (周列将在指定的一周开始(例如,在上述情况下 - > 2013-01-00)。

Although, I am having trouble trying to think of the best way to relate the total frequency to column four across the dataframe. 虽然,我在尝试考虑将整个频率与数据帧中的第4列相关联的最佳方法时遇到了麻烦。

If anyone has any idea of the best way I could go about doing this, it'd be very much appreciated. 如果有人知道我可以做到这一点的最佳方式,我们将非常感激。

If you need anymore information please let me know. 如果您需要更多信息,请告诉我。

Edit: 编辑:

  ones       2013-01-00  2013-01-07  2013-01-14.....    Total_frequency
 comedy         4          5           6                15
 romantic       1          2           0                3 
 action       NaN          1           0                1 

Thanks. 谢谢。

Use: 采用:

#changed data sample for better verify output
print (df)
        one two three  four                 Date
0    comedy   a  asad   123  2013-01-18 10:00:00
1  romantic   b   fas   563  2013-01-28 12:00:00
2    comedy   c   ewf   134  2013-01-22 09:00:00
3    action   a   qef   561  2013-02-18 18:00:00
4    action   z  adwq  1323  2013-01-23 16:00:00

Use Grouper with DataFrameGroupBy.size and unstack : 使用GrouperDataFrameGroupBy.sizeunstack

df['Date'] = pd.to_datetime(df['Date'])
df = (df.groupby(['one',pd.Grouper(freq='W-MON', key='Date')])
        .size()
        .unstack(fill_value=0)
        .sort_index(axis=1))

df.columns = df.columns.date
df['Total_frequency'] = df.sum(axis=1)
print (df)
          2013-01-21  2013-01-28  2013-02-18  Total_frequency
one                                                          
action             0           1           1                2
comedy             1           1           0                2
romantic           0           1           0                1

EDIT: Create boolean mask for values after first NaN s and replace missing values only by this mask: 编辑:在第一个NaN之后为值创建布尔掩码,并仅通过此掩码替换缺失值:

print (df)
          2013-01-00  2013-01-07  2013-01-14
ones                                        
comedy           4.0           5         6.0
romantic         1.0           2         NaN
action           NaN           1         NaN

mask = df.notnull().cumsum(axis=1).ne(0)
#another solution
#mask = df.ffill(axis=1).notnull()

df = df.mask(mask, df.fillna(0))
print (df)
          2013-01-00  2013-01-07  2013-01-14
ones                                        
comedy           4.0           5         6.0
romantic         1.0           2         0.0
action           NaN           1         0.0

You could try this: 你可以试试这个:

df = pd.DataFrame({'one': [random.choice(['comedy', 'action', 'romantic']) for i in range(1000)],
                  'Date': pd.date_range(start = '2013-01-01', periods = 1000)})
df.head()

      one        Date
0   romantic    2013-01-01
1   romantic    2013-01-02
2   romantic    2013-01-03
3   action       2013-01-04
4   romantic    2013-01-05

df.groupby([pd.Grouper(key = 'Date', freq = 'W'), 'one'])['one'].count().unstack(level = 0)

Date          2013-01-06  2013-01-13  2013-01-20.....
one           
comedy         2         2           2              
romantic       NaN       2           2               
action         4         3           3    

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算每个值在pandas列中出现的次数 - Count number of times each value occurs in pandas column 熊猫-如果大多数情况下具有特定值,如何删除行或列? - Pandas- How to drop a row or column if they have a certain value most of the times? 在python中,如何制作一列中每个值与另一列中的值出现的次数(多少行)的矩阵? - In python, how do I make a matrix of the number of times(how many rows) each value in one column occurs with values in another column? pandas 字符串在基于另一列的列中出现的次数 - pandas number of times a string occurs in one column based on another column 将Pandas数据框中字符串出现的次数附加到另一列 - Append number of times a string occurs in Pandas dataframe to another column 熊猫-创建一个新列,并在另一列中填充观察值 - Pandas- Create a new column filled with the number of observations in another column pandas-列中每个唯一字符串/组的新计算行 - pandas- new calculated row for each unique string/group in a column Pandas:在列的每一行中查找最大值,并在另一列中标识相应的值 - Pandas: Find max value in each row of a column and identify corresponding values in another column 根据值在另一列中出现的次数按常量增加值 - increase value by constant based on number of times a value occurs in another column Python Pandas-如何解开具有两个值的数据透视表,每个值变成一个新列? - Python Pandas- how to unstack a pivot table with two values with each value becoming a new column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM