简体   繁体   中英

Calculate the number of rows containing n values in a pandas dataframe

I am working with a table that contains in its columns the procedures performed on a patient, and each row represents a patient. What I need to do is calculate how many patients were given the same combination of procedures. That is, in each row the procedure [A, B] or [A, B, Z] appears. The order doesn't matter.

So assuming this example table, I have tried to use the .isin() method in the following way:

d = {'col1': ['A', 'A', 'B',], 'col2': ['B', 'D', 'C'], 'col3': ['C', '','X',]}
df = pd.DataFrame(data=d)
print(df)
  col1 col2 col3
0    A    B    C
1    A    D     
2    B    C    X

I want to get a list of how many times each procedure is performed given two procedures:

dx1 = ['A', 'B']
df[df.isin(dx1).any(1)].apply(pd.value_counts).sum(axis=1).sort_values(ascending=False)

but I get a list of how many times each procedure is performed given each procedure separately and added together (instead of a "and" puts an "or" as a condition)

C    2.0
H    1.0
D    1.0
A    1.0
     1.0
dtype: float64

What I need is for you to provide a list of how many times a procedure other than A and B is performed, in this case it should be:

C    1.0
dtype: float64

Thank you very much in advance estimates.

Since you do not care about order, sets should solve your problem:

d = {'col1': ['A', 'A', 'B',], 'col2': ['B', 'D', 'C'], 'col3': ['C', '','X',]}
df = pd.DataFrame(data=d)
dx1 = ['A', 'B']
df["procedures"] = df.apply(lambda x: [x.col1, x.col2, x.col3], axis=1)
df["contains_dx1"] = df.procedures.apply(lambda x: set(dx1).issubset(set(x)))

Try this bit of code using functools.reduce , melt , isin , and value_counts :from

from functools import reduce
import pandas as pd
d = {'col1': ['A', 'A', 'B',], 'col2': ['B', 'D', 'C'], 'col3': ['C', '','X',]}
df = pd.DataFrame(data=d)
dx1 = ['A', 'B']
df_bool = reduce(lambda a,b: a | b, [df == i for i in dx1])
s = df[df_bool.sum(1).gt(1)].melt()['value'].value_counts()

s[~s.index.isin(dx1)]

Output:

C    1
Name: value, dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM