简体   繁体   English

在熊猫数据框中查找模式

[英]Finding patterns in Pandas data frame

I have a sizeable Pandas data frame which looks like this. 我有一个看起来像这样的大熊猫数据框。

                                           id  rssi location         day      time
0        2a463296-bd84-512a-8484-9d79649922ae    58     G-19  2016-01-27  12:35:23
1        c6a18d27-63ba-5457-99c1-4c08f6410e33    74     G-19  2016-01-27  12:35:26
2        ee75fa2d-66d9-52e5-9198-a886288ba044    74     G-19  2016-01-27  12:35:28
3        3dc1f5f5-eab3-541c-97f8-e57f32bdf53d    82     G-19  2016-01-27  12:35:28
4        6c1b9019-a6bc-5ed6-82e6-879b7c120991    62     G-19  2016-01-27  12:35:33
26       2a463296-bd84-512a-8484-9d79649922ae    38     G-20  2016-01-27  12:36:58
27       c6a18d27-63ba-5457-99c1-4c08f6410e33    70     G-20  2016-01-27  12:36:59
28       7edb5047-62b8-58bf-89f4-4151d7b694f4    70     G-20  2016-01-27  12:37:01
29       f4c906a8-7680-5bac-b7a0-be408364a268    58     G-20  2016-01-27  12:37:07

... ... ... ... ... ... ……………………

1546516  c6a18d27-63ba-5457-99c1-4c08f6410e33    58     G-59  2016-01-27  13:53:44
1546517  2a463296-bd84-512a-8484-9d79649922ae    50     G-59  2016-01-27  13:53:48
1546518  10baa504-7eec-522f-990b-61b3c215352d    50     G-59  2016-01-27  13:53:49
1546519  15ce7c62-3014-5734-9025-b658278cd33a    42     G-59  2016-01-27  13:53:51
1546520  54b281f5-e532-5fd8-b681-e5bffcd4d6bb    62     G-59  2016-01-27  13:53:53
1546521  1300368f-c823-5fa7-8241-0b245f601859    46     G-59  2016-01-27  13:53:55
1546522  79f64138-d332-51c8-a583-686f30eb65f9    70     G-59  2016-01-27  13:53:56

Each id is the id of a WiFi device. 每个ID是WiFi设备的ID。 I am trying to build up picture of the movements of the device. 我正在尝试建立设备运动的图片。 For example I want to model how 2a463296-bd84-512a-8484-9d79649922ae went from G-19 to to G59 but spent over 1 hour in G20 so we presume the device just passed G19. 例如,我想建模2a463296-bd84-512a-8484-9d79649922ae如何从G-19转到G59,但在G20中花费了超过1个小时,因此我们假设设备刚刚通过G19。

From tests I carried out most devices will be seen every 5 minutes at a minimum. 从我执行的测试中,大多数设备至少每隔5分钟就会看到一次。 Obviously as a device passes one of the detectors it may not be seen as its only detected when it broadcasts a beacon for WiFi. 显然,当设备通过其中一个检测器时,当它广播WiFi信标时,可能不会将其视为唯一检测到的设备。

I want to be able to show that device A was in location 1 at for x amount of time and then went to location 2 for y amount of time or passed location c by the exit door and wasn't seen again for a number of hours. 我希望能够显示设备A在x的时间位于位置1,然后在y的时间到达位置2,或者通过出口门经过位置c,并且在几个小时内都没有被看到。

I am only worried about what a device is doing on a particular day and not any other day. 我只担心设备在特定的一天而不是其他任何一天正在做什么。

What's the best way of going about solving this. 解决此问题的最佳方法是什么。

Use boolean selection to grab the item you want by ID. 使用布尔选择通过ID来获取所需的项目。 Select further using the date as criteria. 使用日期作为条件进一步选择。 Do math using the time column. 使用时间列进行数学运算。 If you want to see when the device wasn't visible at all, it should be apparent when you look at a specific device after selection by ID. 如果您想查看何时根本看不到该设备,则在按ID选择后查看特定设备时应该很明显。

Please see pandas docs for details. 有关详细信息,请参阅pandas文档。 pandas Indexing and Selecting 熊猫索引和选择

Edit: 编辑:

for i in df["day"].unique():
    date_df = df[df["day"]== i]
    ids_seen = set()
    for index, row in date_df.iterrows():
        if row["id"] not in ids_seen:
        '''
          Enter nested loop and do some stuff for each unique id. Add it to a set so we can keep track
        '''
        ids_seen.add(row["id"])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM