简体   繁体   中英

Pandas dataframe select Columns based on other dataframe contains column value in it

I have two dataframes. Here is dwpjp.head() :

jp_number
0 25146315052147720191
1 57225427599900052634
2 86076681691411639833
3 50491824499499656478
4 95588382889227620465

and ct_data.head() :

imjp_number imct_id
0 23605308039805192764 x1E5e3ukRyEFRT6SUAF6lg|d543d3d064da465b8576d87
1 57225427599900052634 aa0d2dac654d4154bf7c09f73faeaf62|-vf6738ee3bed
2 53733358271401869469 6FfHZRoiWs2VO02Pruk07A|__g3d877adf9d154637be26
3 50491824499499656478 __gbe204670ca784a01b7207b42a7e5a5d3|54e2c39cd3
4 82143248133286027306 __g1114a30c6ea548a2a83d5a51718ff0fd|773840905c

I want two new dataframes cct_data , and dct_data from ct_data . The ct_data dataframe should be split on the condition if the jp_number is present in the dwbjp dataframe then put into cct_data , otherwise put into dct_data .

I tried this for common jp_number present in dwpjp :

cct_data = ct_data[ct_data.isin(dwpjp).any(1).values]

and for the other I negated the condition as follows:

dct_data = ct_data[~[ct_data.isin(dwpjp).any(1).values]]

but results are not getting as below.

cct_data

imjp_number imct_id
0 57225427599900052634 aa0d2dac654d4154bf7c09f73faeaf62|-vf6738ee3bed
1 50491824499499656478 __gbe204670ca784a01b7207b42a7e5a5d3|54e2c39cd3

and dct_data :

imjp_number imct_id
0 23605308039805192764 x1E5e3ukRyEFRT6SUAF6lg|d543d3d064da465b8576d87
1 53733358271401869469 6FfHZRoiWs2VO02Pruk07A|__g3d877adf9d154637be26
2 82143248133286027306 __g1114a30c6ea548a2a83d5a51718ff0fd|773840905c

Note: jpnumber=imjp_number .

Note the following:

  1. isin wants values, but it was given the whole dataframe: change .isin(dwpjp) to .isin(dwpjp.jp_number)
  2. In the pre-edited question, each row of dwpjp was actually a list of 1 value, not just 1 value. If that is indeed the case, then .isin(dwpjp.jp_number) actually needs another step: explode the values as .isin(dwpjp.jp_number.explode())
  3. Your negation was being incorrectly applied to a list: change ~[ct_data...] to ~ct_data...

With those fixes, it's working on my end:

cct_data = ct_data[ct_data.isin(dwpjp.jp_number.explode()).any(1).values]
imjp_number imct_id
1 57225427599900052634 aa0d2dac654d4154bf7c09f73faeaf62|-vf6738ee3bed
3 50491824499499656478 __gbe204670ca784a01b7207b42a7e5a5d3|54e2c39cd3
dct_data = ct_data[~ct_data.isin(dwpjp.jp_number.explode()).any(1).values]
imjp_number imct_id
0 23605308039805192764 x1E5e3ukRyEFRT6SUAF6lg|d543d3d064da465b8576d87
2 53733358271401869469 6FfHZRoiWs2VO02Pruk07A|__g3d877adf9d154637be26
4 82143248133286027306 __g1114a30c6ea548a2a83d5a51718ff0fd|773840905c

Modified your formula as below

cct_data = ct_data[ct_data.imjp_number.isin(dwpjp.jp_number)]

and

dct_data = ct_data[~ct_data.imjp_number.isin(dwpjp.jp_number)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM