[英]Calculate the number of rows containing n values in a pandas dataframe
I am working with a table that contains in its columns the procedures performed on a patient, and each row represents a patient. 我正在使用一个表格,其中列出了对患者执行的程序,每行代表一名患者。 What I need to do is calculate how many patients were given the same combination of procedures. 我需要做的是计算有多少患者接受相同的手术组合。 That is, in each row the procedure [A, B] or [A, B, Z] appears. 也就是说,在每一行中出现过程[A,B]或[A,B,Z]。 The order doesn't matter. 订单无关紧要。
So assuming this example table, I have tried to use the .isin() method in the following way: 所以假设这个示例表,我尝试以下列方式使用.isin()方法:
d = {'col1': ['A', 'A', 'B',], 'col2': ['B', 'D', 'C'], 'col3': ['C', '','X',]}
df = pd.DataFrame(data=d)
print(df)
col1 col2 col3
0 A B C
1 A D
2 B C X
I want to get a list of how many times each procedure is performed given two procedures: 我想得到一个列表,给出两个程序,每个程序执行多少次:
dx1 = ['A', 'B']
df[df.isin(dx1).any(1)].apply(pd.value_counts).sum(axis=1).sort_values(ascending=False)
but I get a list of how many times each procedure is performed given each procedure separately and added together (instead of a "and" puts an "or" as a condition) 但是我得到了一个列表,列出每个程序分别执行多少次,并将它们加在一起(而不是“和”将“或”作为条件)
C 2.0
H 1.0
D 1.0
A 1.0
1.0
dtype: float64
What I need is for you to provide a list of how many times a procedure other than A and B is performed, in this case it should be: 我需要的是提供一个除A和B以外的程序执行次数的列表,在这种情况下它应该是:
C 1.0
dtype: float64
Thank you very much in advance estimates. 非常感谢你提前估计。
Since you do not care about order, sets should solve your problem: 由于您不关心订单,因此套装应解决您的问题:
d = {'col1': ['A', 'A', 'B',], 'col2': ['B', 'D', 'C'], 'col3': ['C', '','X',]}
df = pd.DataFrame(data=d)
dx1 = ['A', 'B']
df["procedures"] = df.apply(lambda x: [x.col1, x.col2, x.col3], axis=1)
df["contains_dx1"] = df.procedures.apply(lambda x: set(dx1).issubset(set(x)))
Try this bit of code using functools.reduce
, melt
, isin
, and value_counts
:from 使用functools.reduce
, melt
, isin
和value_counts
:from来尝试这段代码
from functools import reduce
import pandas as pd
d = {'col1': ['A', 'A', 'B',], 'col2': ['B', 'D', 'C'], 'col3': ['C', '','X',]}
df = pd.DataFrame(data=d)
dx1 = ['A', 'B']
df_bool = reduce(lambda a,b: a | b, [df == i for i in dx1])
s = df[df_bool.sum(1).gt(1)].melt()['value'].value_counts()
s[~s.index.isin(dx1)]
Output: 输出:
C 1
Name: value, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.