简体   繁体   中英

Create a Pandas DataFrame from series without duplicating their names?

Is it possible to create a DataFrame from a list of series without duplicating their names?

Ex, creating the same DataFrame as:

>>> pd.DataFrame({ "foo": data["foo"], "bar": other_data["bar"] })

But without without needing to explicitly name the columns?

Try pandas.concat which takes a list of items to combine as its argument:

df1 = pd.DataFrame(np.random.randn(100, 4), columns=list('abcd'))
df2 = pd.DataFrame(np.random.randn(100, 3), columns=list('xyz'))

df3 = pd.concat([df1['a'], df2['y']], axis=1)

Note that you need to use axis=1 to stack things together side-by side and axis=0 (which is the default) to combine them one-over-the-other.

Seems like you want to join the dataframes (works similar to SQL):

import numpy as np
import pandas

df1 = pandas.DataFrame(
    np.random.random_integers(low=0, high=10, size=(10,2)),
    columns = ['foo', 'bar'],
    index=list('ABCDEFHIJK')
)

df2 = pandas.DataFrame(
    np.random.random_integers(low=0, high=10, size=(10,2)),
    columns = ['bar', 'bax'],
    index=list('DEFHIJKLMN')
)

df1[['foo']].join(df2['bar'], how='outer')

The on kwarg takes a list of columns or None . If None , it'll join on the indices of the two dataframes. You just need to make sure that you're using a dataframe for the left size -- hence the double brackets to force df[['foo']] to a dataframe (df['foo'] returns a series)

This gives me:

   foo  bar
A    4  NaN
B    0  NaN
C   10  NaN
D    8    3
E    2    0
F    3    3
H    9   10
I    0    9
J    5    6
K    2    9
L  NaN    3
M  NaN    1
N  NaN    1

You can also do inner , left , and right joins.

I prefer the explicit way, as presented in your original post, but if you really want to write certain names once, you could try this:

import pandas as pd
import numpy as np

def dictify(*args):
   return dict((i,n[i]) for i,n in args)

data = { 'foo': np.random.randn(5) }
other_data = { 'bar': np.random.randn(5) }

print pd.DataFrame(dictify(('foo', data), ('bar', other_data)))

The output is as expected:

        bar       foo
0  0.533973 -0.477521
1  0.027354  0.974038
2 -0.725991  0.350420
3  1.921215  0.648210
4  0.547640  1.652310

[5 rows x 2 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM