简体   繁体   中英

Creating a new column for each potential value in a column to create a truth array matching the row value

I have a dataset in this general format:

group_id   item_id  zone   time 
       1         1     1      1
       1         3     4      1
       2         6     3      1
       2         2     4      1
       3         1     1      1
       1         2     3      2
       1         6     2      2
       2         8     1      2
       2         5     2      2
       3         1     1      2

I am looking to create a new column for each potential value in the zone column, then assign it a truth value if it represents that row. Say there were four potential zones, listed in names = ['zone_1', 'zone_2', 'zone_3', 'zone_4'] . The output should be something like this:

group_id   item_id  time  zone_1  zone_2  zone_3  zone_4
       1         1     1       1       0       0       0
       1         3     1       0       0       0       1
       2         6     1       0       0       1       0
       2         2     1       0       0       0       1
       3         1     1       1       0       0       0      
       1         2     2       0       0       1       0
       1         6     2       0       1       0       0
       2         8     2       1       0       0       0
       2         5     2       0       1       0       0
       3         1     2       1       0       0       0

I'm not sure how best to achieve this. I can create the new columns directly from the name list. I could create an array in place for the zone (eg zone value 3 to be replaced with [0,0,1,0]), then pd.explode on 'zone' followed by a pivot, but replacing line by line is intensive and not very pandas-like. I appreciate any other suggestions.

It may be important to note that group_id + item_id + time create a unique item (there are no two items with the same on all three values).

Thank you!

EDIT: To clarify, as I was unclear: there may be more potential zones than appear uniquely in the column. For example, the above dataset could be paired with a names = ['zone_1', 'zone_2', 'zone_3', 'zone_4', 'zone_5] . Even though there is no instance where zone 5 appears in the dataframe, it also needs a new column. Ideally, a solution will also create a column for this (for later visualization purposes). The output in this case would be:

group_id   item_id  time  zone_1  zone_2  zone_3  zone_4  zone_5
       1         1     1       1       0       0       0       0
       1         3     1       0       0       0       1       0
       2         6     1       0       0       1       0       0
       2         2     1       0       0       0       1       0
       3         1     1       1       0       0       0       0
       1         2     2       0       0       1       0       0
       1         6     2       0       1       0       0       0
       2         8     2       1       0       0       0       0
       2         5     2       0       1       0       0       0
       3         1     2       1       0       0       0       0

Let us do crosstab

df=df.join(pd.crosstab(df.index,df.zone).add_prefix('zone_'))
df
   group_id  item_id  zone  time  zone_1  zone_2  zone_3  zone_4
0         1        1     1     1       1       0       0       0
1         1        3     4     1       0       0       0       1
2         2        6     3     1       0       0       1       0
3         2        2     4     1       0       0       0       1
4         3        1     1     1       1       0       0       0
5         1        2     3     2       0       0       1       0
6         1        6     2     2       0       1       0       0
7         2        8     1     2       1       0       0       0
8         2        5     2     2       0       1       0       0
9         3        1     1     2       1       0       0       0

IIUC, using pandas.get_dummies :

zones = df.pop("zone")
new_df = pd.concat([df, pd.get_dummies(zones, prefix="zone")], 1)
print(new_df)

Output:

   group_id  item_id  time  zone_1  zone_2  zone_3  zone_4
0         1        1     1       1       0       0       0
1         1        3     1       0       0       0       1
2         2        6     1       0       0       1       0
3         2        2     1       0       0       0       1
4         3        1     1       1       0       0       0
5         1        2     2       0       0       1       0
6         1        6     2       0       1       0       0
7         2        8     2       1       0       0       0
8         2        5     2       0       1       0       0
9         3        1     2       1       0       0       0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM