用熊猫过滤CSV文件

Question

I have a CSV file where each row holds some data about a particular patient and a single patient can have multiple rows associated with him or her. 我有一个CSV文件，其中每一行都包含有关特定患者的一些数据，并且单个患者可以具有与其关联的多个行。

The file itself contains thousands of patient records and what I want to do is randomly select 100 patients from the file and then get all records associated with them and then save them to another CSV file. 该文件本身包含数千个患者记录，我要做的是从文件中随机选择100位患者，然后获取与它们相关的所有记录，然后将它们保存到另一个CSV文件中。

So, the file could look like, for example: 因此，该文件可能类似于：

patient_id   Date          Diagnosis   Comments
001-001      23.12.2008    Normal      Normal
001-001      23.12.2009    Normal      Normal
001-002      08.11.2007    Normal      Normal
001-003
....

So, I can load the file as: 因此，我可以将文件加载为：

frame = pd.read_csv('file.csv')
# Get the unique subjects
unique_subjects = frame['patient_id'].unique()
# Use numpy to randomly select some patients
random_us = np.random.choice(unique_subjects, 100)

And then I can load the CSV and then check row by row and select which rows to write back to the CSV file. 然后，我可以加载CSV，然后逐行检查并选择要写回CSV文件的行。

I have a feeling pandas might provide something more direct and I wonder if there is a way to pipe all these operations with it. 我觉得pandas可能会提供更直接的信息，我想知道是否有一种方法可以将所有这些操作与之结合。

Answer 1

您可以使用isin过滤所需的ID：

random_records = frame[frame['patient_id'].isnin(random_us)]

用熊猫过滤CSV文件

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-09-15 23:09:48

用熊猫过滤CSV文件

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-09-15 23:09:48

解决方案1
1 已采纳 2019-09-15 23:09:48