簡體   English   中英

在Pandas數據框中重復行,但使用不同的ID

[英]Repeat rows in Pandas dataframe but with different IDs

我有一個pandas數據框,看起來像這樣:

id    c1    c2    c3
100    2    7     4
100    3    4     1 
100    4    0     10
105    2    3     4
105    3    6     8
105    4    9     2
115    2    1     0
115    3    7     14
115    4    0     20

現在,我希望重復此數據幀的行,但要使用new_id = id + 10 ,如果原始數據幀中已存在此new_id ,則new_id = new_id(the repeated one) + 10

樣品:

id    c1    c2    c3
100    2    7     4
100    3    4     1 
100    4    0     10
105    2    3     4
105    3    6     8
105    4    9     2    
115    2    1     0
115    3    7     14
115    4    0     20
## Repeated data
110    2    7     4
110    3    4     1 
110    4    0     10
##Since 115 already exists it shall now be 125, if 125 exists it shall be 135
125    2    3     4
125    3    6     8
125    4    9     2 
.
.
.   

如果我正確理解了您的問題,請看一看。

d = {'id': [100,100,100,105,105,105,115,115,115], 
 'c1': [2,3,4,2,3,4,2,3,4], 
 'c2':[7,4,0,3,6,9,1,7,0], 
 'c3':[4,1,10,4,8,2,0,14,20]}

df = pd.DataFrame(data=d)

def IDcheck(uniqueID, ID):
  while(True):
    #Increasing the value of the ID by 10
    ID += 10
    #Checking if the new_id is contained within the uniqueID list
    if(((ID) in uniqueID) == True):
        #The new ID exists within the old IDS
        #Updating the value of ID
        ID += 10
    else:
        return ID


def updateRow(df):
   #Selecting unique values from the 'id' column
   uniqueID = df['id'].unique().tolist()

   for ID in uniqueID:    
      #Select all rows with the same 'id' 
      temp = df.loc[df['id'] == ID]

      #Getting the new ID value
      new_id = IDcheck(uniqueID, ID)

      #Updating the ID's in temp to the new_id value
      temp['id'] = new_id

      #Adding the temporary dataframe to the original
      df = df.append(temp, ignore_index=True)

  #Unsorted
  return df

  #Sorted
  #return df.sort_values(by=['id'])


 updateRow(df)

您可以先在ID列中添加10,如果新ID已存在,則再添加10。

(
    df.assign(id=df.id.add(10).add(df.id.add(10).isin(df.id).mul(10)))
    .pipe(lambda x: pd.concat([df, x]))
)

    id  c1  c2  c3
0   100 2   7   4
1   100 3   4   1
2   100 4   0   10
3   105 2   3   4
4   105 3   6   8
5   105 4   9   2
6   115 2   1   0
7   115 3   7   14
8   115 4   0   20
0   110 2   7   4
1   110 3   4   1
2   110 4   0   10
3   125 2   3   4
4   125 3   6   8
5   125 4   9   2
6   125 2   1   0
7   125 3   7   14
8   125 4   0   20

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM