简体   繁体   中英

Locating columns values in pandas dataframe with conditions

We have a dataframe ( df_source ):

Unnamed: 0  DATETIME    DEVICE_ID   COD_1   DAT_1   COD_2   DAT_2   COD_3   DAT_3   COD_4   DAT_4   COD_5   DAT_5   COD_6   DAT_6   COD_7   DAT_7
0   0   200520160941    002222111188    35  200408100500.0  12  200408100400    16  200408100300    11  200408100200    19  200408100100    35  200408100000    43  
1   19  200507173541    000049000110    00  190904192701.0  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
2   20  200507173547    000049000110    00  190908185501.0  08  190908185501    NaN NaN NaN NaN NaN NaN NaN NaN NaN 
3   21  200507173547    000049000110    00  190908205601.0  08  190908205601    NaN NaN NaN NaN NaN NaN NaN NaN NaN 
4   22  200507173547    000049000110    00  190909005800.0  08  190909005800    NaN NaN NaN NaN NaN NaN NaN NaN NaN 
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 
159 775 200529000843    000049768051    40  200529000601.0  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
160 776 200529000843    000049015792    00  200529000701.0  33  200529000701    NaN NaN NaN NaN NaN NaN NaN NaN NaN 
161 779 200529000843    000049180500    00  200529000601.0  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 
162 784 200529000843    000049089310    00  200529000201.0  03  200529000201    61  200529000201    NaN NaN NaN NaN NaN NaN NaN 
163 786 200529000843    000049768051    40  200529000401.0  NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 

We calculated values_cont , a dict , for a subset:

v_subset = ['COD_1', 'COD_2', 'COD_3', 'COD_4', 'COD_5', 'COD_6', 'COD_7']
values_cont = pd.value_counts(df_source[v_subset].values.ravel())

We obtained as result (values, counter):

00    134
08     37
42     12
40     12
33      3
11      3
03      2
35      2
43      2
44      1
61      1
04      1
12      1
60      1
05      1
19      1
34      1
16      1

Now, the question is:

How to locate values in columns corresponding to counter, for instance:

How to locate:

 df['DEVICE_ID']  # corresponding with values ('00') and  counter ('134')
 df['DEVICE_ID']  # corresponding with values ('08') and  counter ('37')

 ...

 df['DEVICE_ID']  # corresponding with values ('16') and  counter ('1')
  • I believe you need DataFrame.melt with aggregate join for ID and GroupBy.size for counts.
  • This implementation will result in a dataframe with a column ( value ) for the CODES , all the associated DEVICE_ID s, and the count of ids associated with each code.
    • This is an alternative to values_cont in the question.
v_subset = ['COD_1', 'COD_2', 'COD_3', 'COD_4', 'COD_5', 'COD_6', 'COD_7']

df = (df_source.melt(id_vars='DEVICE_ID', value_vars=v_subset)
               .dropna(subset=['value'])
               .groupby('value')
               .agg(DEVICE_ID = ('DEVICE_ID', ','.join), count= ('value','size'))
               .reset_index())
print (df)
   value                                          DEVICE_ID  count
0     00  000049000110,000049000110,000049000110,0000490...      7
1     03                                       000049089310      1
2     08             000049000110,000049000110,000049000110      3
3     11                                       002222111188      1
4     12                                       002222111188      1
5     16                                       002222111188      1
6     19                                       002222111188      1
7     33                                       000049015792      1
8     35                          002222111188,002222111188      2
9     40                          000049768051,000049768051      2
10    43                                       002222111188      1
11    61                                       000049089310      1


# print DEVICE_ID for CODES == '03'
print(df.DEVICE_ID[df.value == '03'])

[out]:
1    000049089310
Name: DEVICE_ID, dtype: object
# to return all rows where COD_1 is '00'
df_source[df_source.COD_1 == '00']

# to return only the DEVICE_ID column where COD_1 is '00'
df_source['DEVICE_ID'][df_source.COD_1 == '00']

You can use df.iloc to search out rows that match based on columns. Then from that row you can select the column of interest and output it. There may be a more pythonic way to do this.

df2=df.iloc[df['COD_1']==00]

df3=df2.iloc[df2['DAT_1']==134]

df_out=df3.iloc['DEVICE_ID']

here's more info in .iloc : https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM