Have two dataframes, one of them is very large, as follows:
import pandas as pd
import numpy as np
import string, random
siz = int(1e10)
random.seed(1234)
a1 = pd.Series((random.choice(string.ascii_uppercase) for _ in range(siz)), name='CatA')
a2 = pd.Series((random.choice(string.ascii_lowercase) for _ in range(siz)), name='CatB')
val1 = pd.Series(pd.Series(np.random.randint(2, high=10, size=siz), name='Value'))
df_a = pd.DataFrame([a1, a2, val1]).T.set_index(['CatA', 'CatB'])
siz = 1000
random.seed(4321)
b1 = pd.Series((random.choice(string.ascii_uppercase) for _ in range(siz)), name='CatA')
b2 = pd.Series((random.choice(string.ascii_lowercase) for _ in range(siz)), name='CatB')
val2 = pd.Series(pd.Series(np.random.randint(2, high=10, size=siz), name='Value'))
df_b = pd.DataFrame([b1, b2, val2]).T.set_index(['CatA', 'CatB'])
Value
of df_a intact.
df_b
should be eliminated from df_a
. Value
of df_a
should be preserved.Value
of df_b
is dropped. Tried df_a.sub(df_b.drop('Value', 1))
... which doesn't work.
Is there a vectoriz-ed way to do this?
I believe you need Index.isin
with inverted mask by ~
:
df = df_a[~df_a.index.isin(df_b.index)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.