简体   繁体   中英

How do I create new columns from existing column values?

I have a dataframe in the form:

date         user  type
01/01/2021   Bob    A
02/01/2021   Lina   test
02/01/2021   Bob    test
06/01/2020   Bob    C 
08/01/2020   Lina   A
08/01/2020   Lina   A
08/01/2020   Bob    test
09/01/2020   Lina   test

I want to mutate the said dataframe so that it contains only test type rows moving all the other type values as a new columns with the count before the current test row so that I would getting something like:

date         user    type  A   C 
02/01/2021   Lina    test  0   0 
02/01/2021   Bob     test  1   0 
08/01/2020   Bob     test  1   1 
09/01/2020   Lina    test  2   0 

I'm trying to use pd.crosstab :

pd.crosstab([df['user'],df['date']], df['type'])

But I get this:

  User   date          A   C  test
  Bob    01/01/2021    1   0   0
         02/01/2021    0   0   1
         06/01/2020    0   1   0
         08/01/2020    0   0   1
  Lina   02/01/2021    0   0   1
         08/01/2020    1   0   0
         08/01/2020    1   0   0
         09/01/2020    0   0   1

How can I sum A and B value before each 1 in the test column?

To achieve something along the lines of what you are looking for, you need to group by user on your crosstab and then apply the cumsum aggregation function.

ct = pd.crosstab([df.user, df.date], df.type)
ctgb = ct.groupby("user").cumsum()

      User   date          A   C  test
      Bob    01/01/2021    1   0   0
             02/01/2021    1   0   1
             06/01/2020    1   1   0
             08/01/2020    1   1   1
      Lina   02/01/2021    0   0   1
             08/01/2020    2   0   0
             09/01/2020    2   0   2

Then to find out the sum of type A and C up until a test day, apply a mask filtering out days in which test == 1 in the crosstab frame.

cumulative_test_scores = ctgb.loc[ct.test == 1, ["A", "C"]]

      User   date          A   C
      Bob    02/01/2021    1   0
             08/01/2020    1   1
      Lina   02/01/2021    0   0
             09/01/2020    2   0

It's not the exact same format you desired however I believe it makes more sense than having the redundant type column and illustrates the results are for cumulative scores are for each user.

Edit

To get your desire format:

cumulative_test_scores.reset_index(level=0, inplace=True)
cumulative_test_scores.columns.name = None
cumulative_test_scores.insert(1, 'type', 'test')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM