Pandas：迭代df中的已排序行，实现计数器

Question

I tried this in Stata, and failed. 我在Stata尝试了这个，但失败了。 Trying it Python/pandas now - something I'm less familiar with... 现在尝试Python / pandas - 我不熟悉的东西......

I've got a dataframe on attendance data, with each row being a timestamped entry or exit. 我有一个关于考勤数据的数据框，每行都是带时间戳的进入或退出。 It looks like this: 它看起来像这样：

And what I want is to calculate how many people are in the office at any given time, on any given day. 而我想要的是计算在任何特定时间，在任何特定时间，办公室里有多少人。 I'd like to set up a counter which adds 1 for every entry ( type=="O" ), and subtracts 1 for every exit ( type=="C" ). 我想设置一个counter ，为每个条目添加1（ type=="O" ），并为每个出口减去1（ type=="C" ）。

My Python attempt is this: 我的Python尝试是这样的：

            df = pd.read_stata("some-data.dta")

            sort = df.sort(['date', 'att_time'])

            for i, day in enumerate(sort['date']):
                sort['counter'][i] = 0
                if type=="O":
                    sort['counter'][i] = sort['counter'][i-1] + 1
                elif type=="C":
                    sort['counter'][i] = sort['counter'][i-1] - 1

Which throws this error: 这引发了这个错误：

__main__:2 : SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame __main__:2 ：SettingWithCopyWarning：尝试在DataFrame的切片副本上设置值

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy 请参阅文档中的警告： http ： //pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

From reading other SO posts, I tried setting the copy flag to False ( sort.is_copy==False ), but the error message still pops up. 从阅读其他SO帖子，我尝试将复制标志设置为False （ sort.is_copy==False ），但仍会弹出错误消息。 Also, worryingly, I noticed that it's possibly not iterating over the sorted list: 另外，令人担忧的是，我注意到它可能没有迭代排序列表：

                for i, day in enumerate(sorted(sort['date'])):
                    print i, day, sort['date'][i]

The day and sort['date'][i] , which should be the same date, aren't. day和sort['date'][i] ，应该是相同的日期，不是。 So my i index seemingly can't be relied on, even if I got around the SettingWithCopyWarning . 因此，即使我绕过了SettingWithCopyWarning ，我的i索引也似乎无法依赖。 Halp? HALP？

Answer 1

You can use the cumsum to simplify the process, which is mush faster than manually looping over all rows. 您可以使用cumsum来简化过程，这比手动循环所有行更快。

# artificial data
# =========================
df = pd.DataFrame('0 0 0 0 C 0 C 0 0 C 0 C'.split(), index=pd.date_range('2015-08-31 08:00:00', periods=12, freq='5min'), columns=['type'])
df

                    type
2015-08-31 08:00:00    0
2015-08-31 08:05:00    0
2015-08-31 08:10:00    0
2015-08-31 08:15:00    0
2015-08-31 08:20:00    C
2015-08-31 08:25:00    0
2015-08-31 08:30:00    C
2015-08-31 08:35:00    0
2015-08-31 08:40:00    0
2015-08-31 08:45:00    C
2015-08-31 08:50:00    0
2015-08-31 08:55:00    C


# processing
# ===================================
df['counter'] = df['type'].map({'0': 1, 'C': -1}).cumsum()
df

                    type  counter
2015-08-31 08:00:00    0        1
2015-08-31 08:05:00    0        2
2015-08-31 08:10:00    0        3
2015-08-31 08:15:00    0        4
2015-08-31 08:20:00    C        3
2015-08-31 08:25:00    0        4
2015-08-31 08:30:00    C        3
2015-08-31 08:35:00    0        4
2015-08-31 08:40:00    0        5
2015-08-31 08:45:00    C        4
2015-08-31 08:50:00    0        5
2015-08-31 08:55:00    C        4

Pandas：迭代df中的已排序行，实现计数器

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-08-31 18:17:27

Pandas：迭代df中的已排序行，实现计数器

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-08-31 18:17:27

解决方案1
3 已采纳 2015-08-31 18:17:27