[英]python pandas how to add unique identifier for similar group data
這是我的數據框
product_title variation_list
Chauvet DJ GigBar Move Effect Light System ['Black', 'White']
Rane Twelve MKII DJ Controller ['New', 'Blemished']
我預期的數據框將如下所示
group_id product_title variation_list unique_id
FAT-1301 Chauvet DJ GigBar Move Effect Light System Black FAT-01
FAT-1301 Chauvet DJ GigBar Move Effect Light System White FAT-02
FAT-1302 Rane Twelve MKII DJ Controller New FAT-03
FAT-1302 Rane Twelve MKII DJ Controller Blemished FAT-04
基本上我想添加額外的兩列group_id
,它將為同一組數據指定全局 id 和unique_id
列,它將為每個數據指定唯一值。
df2 = df.reset_index().explode('variation_list')
df2['group_id'] = 'FAT' + df2['index'].add(1).astype(str)
df2['unique_id'] = 'FAT' + (df2.reset_index(drop = True).index+1).astype(str)
df2
index product_title ... group_id unique_id
0 0 Chauvet DJ GigBar Move Effect Light System ... FAT1 FAT1
0 0 Chauvet DJ GigBar Move Effect Light System ... FAT1 FAT2
1 1 Chauvet DJ GigBar Move Effect Light System ... FAT2 FAT3
1 1 Chauvet DJ GigBar Move Effect Light System ... FAT2 FAT4
使用explode
-
import pandas as pd
d = {'product_title':['Chauvet DJ GigBar Move Effect Light System',' Chauvet DJ GigBar Move Effect Light System'],
'variation_list' :[['Black', 'White'], ['New', 'Blemished']]}
df = pd.DataFrame(d)
df.insert(0, "group_id", df.index + 1)
df = df.explode(['variation_list']).reset_index()
df.insert(4, "unique_id", df.index + 1)
df.drop(columns=['index'], inplace=True)
df.group_id = df.group_id.apply(lambda x: 'FAT-'+ str(x) )
df.unique_id = df.unique_id.apply(lambda x: 'FAT-'+ str(x) )
print(df)
輸出 -
group_id product_title variation_list unique_id
0 FAT-1 Chauvet DJ GigBar Move Effect Light System Black FAT-1
1 FAT-1 Chauvet DJ GigBar Move Effect Light System White FAT-2
2 FAT-2 Chauvet DJ GigBar Move Effect Light System New FAT-3
3 FAT-2 Chauvet DJ GigBar Move Effect Light System Blemished FAT-4
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.