简体   繁体   中英

How to “multiply” python pandas dataframes (as if they were vectors)?

I'm learning pandas. I have two dataframes:

df1 = 
quality1  value
A         1
B         2
C         3

df2 = 
quality2  value
D         1
E         10
F         100

I want to multiply them (as I might do with vectors to get a matrix). The answer should be:

df3 = 
quality1    quality2  value
A           D         1
            E         10
            F         100
B           D         2
            E         20
            F         200
C           D         3
            E         30
            F         300

How can I achieve this?

It's not the prettiest, but it would work:

>>> df1["dummy"] = 1
>>> df2["dummy"] = 1
>>> dfm = df1.merge(df2, on="dummy")
>>> dfm["value"] = dfm.pop("value_x") * dfm.pop("value_y")
>>> del dfm["dummy"]
>>> dfm
  quality1 quality2  value
0        A        D      1
1        A        E     10
2        A        F    100
3        B        D      2
4        B        E     20
5        B        F    200
6        C        D      3
7        C        E     30
8        C        F    300

Until we get native support for a Cartesian join ( whistles and looks away.. ), merging on a dummy column is an easy way to get the same effect. The intermediate frame looks like

>>> dfm
  quality1  value_x  dummy quality2  value_y
0        A        1      1        D        1
1        A        1      1        E       10
2        A        1      1        F      100
3        B        2      1        D        1
4        B        2      1        E       10
5        B        2      1        F      100
6        C        3      1        D        1
7        C        3      1        E       10
8        C        3      1        F      100

You could also use cartesian function from scikit-learn :

from sklearn.utils.extmath import cartesian

# Your data:
df1 = pd.DataFrame({'quality1':list('ABC'), 'value':[1,2,3]})
df2 = pd.DataFrame({'quality2':list('DEF'), 'value':[1,10,100]})

# Make the matrix of labels:
dfm = pd.DataFrame(cartesian((df1.quality1.values, df2.quality2.values)), 
                   columns=['quality1', 'quality2'])

# Multiply values:
dfm['value'] = df1.value.values.repeat(df2.value.size) * pd.np.tile(df2.value.values, df1.value.size)

print dfm.set_index(['quality1', 'quality2'])

Which yields:

                   value
quality1 quality2       
A        D             1
         E            10
         F           100
B        D             2
         E            20
         F           200
C        D             3
         E            30
         F           300

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM