简体   繁体   中英

Splitting a dataframe in python

I have a dataframe df=

    Type   ID      QTY_1   QTY_2  RES_1   RES_2
    X       1       10      15      y       N
    X       2       12      25      N       N
    X       3       25      16      Y       Y
    X       4       14      62      N       Y
    X       5       21      75      Y       Y
    Y       1       10      15      y       N
    Y       2       12      25      N       N
    Y       3       25      16      Y       Y
    Y       4       14      62      N       N
    Y       5       21      75      Y       Y

I want the result data set of two different data frames with QTY which has Y in their respective RES. Below is my expected result

df1= 

Type   ID      QTY_1   
X       1       10
X       3       25
X       5       21
Y       1       10 
Y       3       25
Y       5       21

df2 = 

Type   ID      QTY_2
X       3       16  
X       4       62
X       5       75
Y       3       16
Y       5       75

You can do this:

df1 = df[['Type', 'ID', 'QTY_1']].loc[df.RES_1.isin(['Y', 'y'])]

df2 = df[['Type', 'ID', 'QTY_2']].loc[df.RES_2.isin(['Y', 'y'])]

or

df1 = df[['Type', 'ID', 'QTY_1']].loc[df.RES_1.str.lower() == 'y']

df2 = df[['Type', 'ID', 'QTY_2']].loc[df.RES_2.str.lower() == 'y']

Output:

>>> df1
  Type  ID  QTY_1
0    X   1     10
2    X   3     25
4    X   5     21
5    Y   1     10
7    Y   3     25
9    Y   5     21
>>> df2
  Type  ID  QTY_2
2    X   3     16
3    X   4     62
4    X   5     75
7    Y   3     16
9    Y   5     75

Use a dictionary

It's good practice to use a dictionary for a variable number of variables. Although in this case there may be only a couple of categories, you benefit from organized data. For example, you can access RES_1 data via dfs[1] .

dfs = {i: df.loc[df['RES_'+str(i)].str.lower() == 'y', ['Type', 'ID', 'QTY_'+str(i)]] \
          for i in range(1, 3)}

print(dfs)

{1:   Type  ID  QTY_1
0    X   1     10
2    X   3     25
4    X   5     21
5    Y   1     10
7    Y   3     25
9    Y   5     21,
 2:   Type  ID  QTY_2
2    X   3     16
3    X   4     62
4    X   5     75
7    Y   3     16
9    Y   5     75}

You need:

df1 = df.loc[(df['RES_1']=='Y') | (df['RES_1']=='y')].drop(['QTY_2', 'RES_1', 'RES_2'], axis=1)

df2 = df.loc[(df['RES_2']=='Y') | (df['RES_2']=='y')].drop(['QTY_1', 'RES_1', 'RES_2'], axis=1)
print(df1)
print(df2)

Output:

   Type ID  QTY_1
0   X   1   10
2   X   3   25
4   X   5   21
5   Y   1   10
7   Y   3   25
9   Y   5   21

  Type  ID  QTY_2
2   X   3   16
3   X   4   62
4   X   5   75
7   Y   3   16
9   Y   5   75

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM