简体   繁体   中英

Pandas and Sets - ValueError: Length of values does not match length of index

I am trying to create a new column in my dataframe that contains the intersection of two sets (each contained in two separate columns). The columns themselves hold sets.

dfc['INTERSECTION'] =  set(dfc.TABS1).intersection(set(dfc.TABS2))

I get a Value error. I was able to do

dfc['LEFT'] = set(dfc.TABS1) - set(dfc.TABS2)

no problem. TABS1 and TABS2 have values.

Any thoughts? Thanks.

I am adding example data below.

GROUP TABS1               TABS2 
A     {'T1','T2','T3'}   {'T2','T3','T4'} 
B     {'T5', 'T6'}       {'T6'}

Chris gave example, but using very different data set. I am looking for the intersection of TAB1 and TAB2 in a third column 'INTERSECTION. As mentioned above, I have no problems with

dfc['LEFT'] = set(dfc.TAB1) - set(dfc.TAB2)

This looks like it should be so straight forward...

set removes duplicates so you end up with a dict with a length less than the length of your dataframe. You need make sure the length of the array you are assign to a new column is equal to the length of the dataframe. You can replace the non-intersections with NaN if you want using list comprehension:

# sample data
df = pd.DataFrame([[1,2,3], [1,2,3], [2,3,4], [3,4,5]], columns=list('abc'))
# list comprehension
df['intersection'] = [a if a in set(df['b']) else np.nan for a in df['a']]

   a  b  c  intersection
0  1  2  3           NaN
1  1  2  3           NaN
2  2  3  4           2.0
3  3  4  5           3.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM