简体   繁体   English

计算发生次数后,如何在列表中的每一天进行输入

[英]After counting occurrences, how to make entries for each day in list

I have two columns of dates in two separate csv files. 我在两个单独的csv文件中有两列日期。 I am reading them into python, and plan to plot in matplotlib 我正在将它们读入python,并计划在matplotlib中进行绘制

One is for invoices: 一种是发票:

5/1/2015
5/1/2015
5/1/2015
5/2/2015
5/2/2015
5/2/2015
5/2/2015
5/3/2015
5/3/2015
5/3/2015
5/3/2015
5/4/2015
5/4/2015
5/4/2015
5/4/2015
5/4/2015
5/4/2015
5/5/2015
5/5/2015
5/5/2015
5/5/2015
5/7/2015

And the other is for disputes: 另一个是关于纠纷的:

5/1/2015
5/1/2015
5/2/2015
5/2/2015
5/3/2015
5/5/2015
5/5/2015

I want to make a list with the number of disputes divided by the number of invoices per day. 我要列出一份争议数量除以每天发票数量的清单。 So for May 1, 2015 the percent is 2/3. 因此,对于2015年5月1日,百分比为2/3。 For May 2, 2015 2/4. 2015年5月2日2/4。 May 3, 2015 is 1/4. 2015年5月3日是1/4。 May 4, 2015 is 0. May 5, 2015 is 2/4. 2015年5月4日为0。2015年5月5日为2/4。 There are no invoices or disputes on May 6, 2015 so the percent should be zero. 2015年5月6日没有发票或争议,因此百分比应为零。 Thus the list should be [.66, .5, .25, 0, .5, 0, 0] 因此,列表应为[.66,.5,.25、0,.5、0、0]

Then I am going to graph the percents on the y axis and the date as the x-axis. 然后,我将在y轴上绘制百分比,将日期作为x轴绘制。

I have tried to use df.index.day but then I get groups for the first day of each month, the second, etc. I was using value_counts to count up the occurrences of each date and then dividing between two lists but I was missing numbers where I didn't have invoices or disputes, and I want there to be a value for every day. 我尝试使用df.index.day,但是随后我得到了每个月的第一天,第二天等的组。我正在使用value_counts来计算每个日期的出现次数,然后在两个列表之间进行划分,但是我丢失了没有发票或争议的数字,我希望每天都有价值。

Does anyone know a simple way to do this? 有人知道这样做的简单方法吗?

One easy way to get the counts is to use collections.Counter : 一种简单的获取计数的方法是使用collections.Counter

from collections import Counter
with open('invoice_dates') as f:
    invoice_count = Counter(line.strip() for line in f)

and similarly for dispute_count . 并且类似的dispute_count You can then get a dictionary mapping dates to dispute percentages by 然后,您可以获取一个字典,将日期映射到争议百分比,

from __future__ import division # in case you are on Python 2.x
dispute_percentage = {date: dispute_count.get(date, 0) / invoices
                      for date, invoices in invoice_count.items()}

Use iteritems() instead of items() in the last line if you are on Python 2.x. 如果您使用的是Python 2.x,请在最后一行使用iteritems()而不是items()

You should have made it more clear that you were using pandas -- there are built-in tools to aid you to do what you want to do. 您应该更清楚地说明您正在使用pandas -有内置的工具可以帮助您完成想要做的事情。 In this case, you can use value_counts on your f and f2 . 在这种情况下,可以在ff2上使用value_counts With your example data: 用您的示例数据:

>>> f = pd.to_datetime(f)
>>> f2 = pd.to_datetime(f2)
>>> f.value_counts()/f2.value_counts()
2015-05-01    0.666667
2015-05-02    0.500000
2015-05-03    0.250000
2015-05-04         NaN
2015-05-05    0.500000
2015-05-07         NaN
dtype: float64
>>> (f.value_counts()/f2.value_counts()).fillna(0.0)
2015-05-01    0.666667
2015-05-02    0.500000
2015-05-03    0.250000
2015-05-04    0.000000
2015-05-05    0.500000
2015-05-07    0.000000
dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM