I have two dataframes. Here is dwpjp.head()
:
jp_number | |
---|---|
0 | 25146315052147720191 |
1 | 57225427599900052634 |
2 | 86076681691411639833 |
3 | 50491824499499656478 |
4 | 95588382889227620465 |
and ct_data.head()
:
imjp_number | imct_id | |
---|---|---|
0 | 23605308039805192764 | x1E5e3ukRyEFRT6SUAF6lg|d543d3d064da465b8576d87 |
1 | 57225427599900052634 | aa0d2dac654d4154bf7c09f73faeaf62|-vf6738ee3bed |
2 | 53733358271401869469 | 6FfHZRoiWs2VO02Pruk07A|__g3d877adf9d154637be26 |
3 | 50491824499499656478 | __gbe204670ca784a01b7207b42a7e5a5d3|54e2c39cd3 |
4 | 82143248133286027306 | __g1114a30c6ea548a2a83d5a51718ff0fd|773840905c |
I want two new dataframes cct_data
, and dct_data
from ct_data
. The ct_data
dataframe should be split on the condition if the jp_number
is present in the dwbjp
dataframe then put into cct_data
, otherwise put into dct_data
.
I tried this for common jp_number
present in dwpjp
:
cct_data = ct_data[ct_data.isin(dwpjp).any(1).values]
and for the other I negated the condition as follows:
dct_data = ct_data[~[ct_data.isin(dwpjp).any(1).values]]
but results are not getting as below.
cct_data
imjp_number | imct_id | |
---|---|---|
0 | 57225427599900052634 | aa0d2dac654d4154bf7c09f73faeaf62|-vf6738ee3bed |
1 | 50491824499499656478 | __gbe204670ca784a01b7207b42a7e5a5d3|54e2c39cd3 |
and dct_data
:
imjp_number | imct_id | |
---|---|---|
0 | 23605308039805192764 | x1E5e3ukRyEFRT6SUAF6lg|d543d3d064da465b8576d87 |
1 | 53733358271401869469 | 6FfHZRoiWs2VO02Pruk07A|__g3d877adf9d154637be26 |
2 | 82143248133286027306 | __g1114a30c6ea548a2a83d5a51718ff0fd|773840905c |
Note: jpnumber=imjp_number
.
Note the following:
isin
wants values, but it was given the whole dataframe: change .isin(dwpjp)
to .isin(dwpjp.jp_number)
dwpjp
was actually a list of 1 value, not just 1 value. If that is indeed the case, then .isin(dwpjp.jp_number)
actually needs another step: explode the values as .isin(dwpjp.jp_number.explode())
~[ct_data...]
to ~ct_data...
With those fixes, it's working on my end:
cct_data = ct_data[ct_data.isin(dwpjp.jp_number.explode()).any(1).values]
imjp_number | imct_id | |
---|---|---|
1 | 57225427599900052634 | aa0d2dac654d4154bf7c09f73faeaf62|-vf6738ee3bed |
3 | 50491824499499656478 | __gbe204670ca784a01b7207b42a7e5a5d3|54e2c39cd3 |
dct_data = ct_data[~ct_data.isin(dwpjp.jp_number.explode()).any(1).values]
imjp_number | imct_id | |
---|---|---|
0 | 23605308039805192764 | x1E5e3ukRyEFRT6SUAF6lg|d543d3d064da465b8576d87 |
2 | 53733358271401869469 | 6FfHZRoiWs2VO02Pruk07A|__g3d877adf9d154637be26 |
4 | 82143248133286027306 | __g1114a30c6ea548a2a83d5a51718ff0fd|773840905c |
Modified your formula as below
cct_data = ct_data[ct_data.imjp_number.isin(dwpjp.jp_number)]
and
dct_data = ct_data[~ct_data.imjp_number.isin(dwpjp.jp_number)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.