简体   繁体   中英

Create a dummy column and input value based on condition

I have data containing a list of topics (topics 1-6; and 0 meaning no topic is assigned) and their value. I want to create a new column for each topic and each label, and fill the column with the value. Here's what the table looks like...

reviewId   topic   value     label
      01       2      -4  negative
      02       2       9  positive
      03       0      -7  negative
      04       5      -1  negative
      05       1      38  positive

What should I do to create a table looking like this?

reviewId topic value label t1p t1n t2p t2n t3p t3n t4p t4n t5p t5n
01 2 -4 negative 0 0 0 -4 0 0 0 0 0 0
02 2 9 positive 0 0 9 0 0 0 0 0 0 0
03 0 -7 negative 0 0 0 0 0 0 0 0 0 0
04 5 -1 negative 0 0 0 0 0 0 0 0 0 -1
05 1 38 positive 38 0 0 0 0 0 0 0 0 0

Here's reproducible data to work on.

raw_df = pd.DataFrame({
    'reviewId': ['01', '02', '03', '04', '05'],
    'topic': [2, 2, 0, 5, 1],
    'value': ['-4', '9', '-7', '-1', '38']})

Here's a link similar topic on this problem..

  • Problem about inputting value: link
  • Problem about creating columns based on label: link
from itertools import product

(df.join(pd.get_dummies(zip(df["topic"], df["label"]))
           .reindex(columns=product(range(1, 5+1), ["positive", "negative"]),
                    fill_value=0)
           .mul(df["value"], axis="rows")
           .rename(columns=lambda c: f"t{c[0]}{c[1][0]}")))
  • dummify the "topic"-"label" pairs
    • reindex with the entire (1...5) X (p, n) possible values to ensure all are here
    • and initiate the missings, eg, "4"-"negative" pair in the sample data with 0
  • given the 1/0 frame at this point, multiply it with "values" row-wise
    • 1/0s will act as a selector of values
  • column names are like "(2, positive)" etc.; rename them to "t2p" etc.

and join it with the original frame to get

   reviewId  topic  value     label  t1p  t1n  t2p  t2n  t3p  t3n  t4p  t4n  t5p  t5n
0         1      2     -4  negative    0    0    0   -4    0    0    0    0    0    0
1         2      2      9  positive    0    0    9    0    0    0    0    0    0    0
2         3      0     -7  negative    0    0    0    0    0    0    0    0    0    0
3         4      5     -1  negative    0    0    0    0    0    0    0    0    0   -1
4         5      1     38  positive   38    0    0    0    0    0    0    0    0    0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM