[英]Repeat rows in Pandas dataframe but with different IDs
我有一个pandas
数据框,看起来像这样:
id c1 c2 c3
100 2 7 4
100 3 4 1
100 4 0 10
105 2 3 4
105 3 6 8
105 4 9 2
115 2 1 0
115 3 7 14
115 4 0 20
现在,我希望重复此数据帧的行,但要使用new_id = id + 10
,如果原始数据帧中已存在此new_id
,则new_id = new_id(the repeated one) + 10
样品:
id c1 c2 c3
100 2 7 4
100 3 4 1
100 4 0 10
105 2 3 4
105 3 6 8
105 4 9 2
115 2 1 0
115 3 7 14
115 4 0 20
## Repeated data
110 2 7 4
110 3 4 1
110 4 0 10
##Since 115 already exists it shall now be 125, if 125 exists it shall be 135
125 2 3 4
125 3 6 8
125 4 9 2
.
.
.
如果我正确理解了您的问题,请看一看。
d = {'id': [100,100,100,105,105,105,115,115,115],
'c1': [2,3,4,2,3,4,2,3,4],
'c2':[7,4,0,3,6,9,1,7,0],
'c3':[4,1,10,4,8,2,0,14,20]}
df = pd.DataFrame(data=d)
def IDcheck(uniqueID, ID):
while(True):
#Increasing the value of the ID by 10
ID += 10
#Checking if the new_id is contained within the uniqueID list
if(((ID) in uniqueID) == True):
#The new ID exists within the old IDS
#Updating the value of ID
ID += 10
else:
return ID
def updateRow(df):
#Selecting unique values from the 'id' column
uniqueID = df['id'].unique().tolist()
for ID in uniqueID:
#Select all rows with the same 'id'
temp = df.loc[df['id'] == ID]
#Getting the new ID value
new_id = IDcheck(uniqueID, ID)
#Updating the ID's in temp to the new_id value
temp['id'] = new_id
#Adding the temporary dataframe to the original
df = df.append(temp, ignore_index=True)
#Unsorted
return df
#Sorted
#return df.sort_values(by=['id'])
updateRow(df)
您可以先在ID列中添加10,如果新ID已存在,则再添加10。
(
df.assign(id=df.id.add(10).add(df.id.add(10).isin(df.id).mul(10)))
.pipe(lambda x: pd.concat([df, x]))
)
id c1 c2 c3
0 100 2 7 4
1 100 3 4 1
2 100 4 0 10
3 105 2 3 4
4 105 3 6 8
5 105 4 9 2
6 115 2 1 0
7 115 3 7 14
8 115 4 0 20
0 110 2 7 4
1 110 3 4 1
2 110 4 0 10
3 125 2 3 4
4 125 3 6 8
5 125 4 9 2
6 125 2 1 0
7 125 3 7 14
8 125 4 0 20
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.