简体   繁体   中英

Splitting Dataframe based on corresponding numpy array values

I have pandas dataframe A that looks like :

    2007-12-31    50230.62
    2008-01-02    48646.84
    2008-01-03    48748.04
    2008-01-04    46992.22
    2008-01-07    46491.28
    2008-01-08    45347.72
    2008-01-09    45681.68
    2008-01-10    46430.5

Where the date column is the index. I also have an numpy array B of the same length which has element -1, 0 and 1. What is the cleanest way to split the dataframe A into 3 dataframes such that the rows with equal corresponding B elements are grouped together. For eg. if B = numpy.array([0, 0, 0, 1, 1, -1, -1, 0]) then the dataframe should be split into :

    X
    2007-12-31    50230.62
    2008-01-02    48646.84
    2008-01-03    48748.04
    2008-01-10    46430.5

    Y
    2008-01-04    46992.22
    2008-01-07    46491.28

    Z
    2008-01-08    45347.72
    2008-01-09    45681.68

It's easy to utilize groupby from pandas, then you have the option to keep them grouped so you're not doubling your data. But you can always assign then

import numpy as np
import pandas as pd
import io

data = """    2007-12-31    50230.62
    2008-01-02    48646.84
    2008-01-03    48748.04
    2008-01-04    46992.22
    2008-01-07    46491.28
    2008-01-08    45347.72
    2008-01-09    45681.68
    2008-01-10    46430.5"""

df = pd.read_csv(io.StringIO(data), delimiter='\s+', header=None)
B = np.array([0, 0, 0, 1, 1, -1, -1, 0])

df['B'] = B

df_groups = df.groupby(['B'])

x = df_groups.get_group((0))
y = df_groups.get_group((-1))
z = df_groups.get_group((1))

The 0,-1,1 are the names based on the B value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM