简体   繁体   中英

Duplicated data in excel using Openpyxl

I have created a python script that will append data in excel. However, data that are being transferred in excel is having multiple duplication. Can someone help me fix my script?

tree = ET.parse('users.xml')
root = tree.getroot()
#create excel
wb = Workbook()
ws = wb.active
ws.title = ("Active Users")
df=pd.DataFrame(columns=["Login", "User Name", "Role", "Status"])
for user in root.findall('user'):
    login = user.find('login').text
    for m in tls.getUserByLogin(login):
        user_status = int(m.get("isActive"))
        
        if user_status == 1:
            lastname = m.get("lastName")
            firstname = m.get("firstName")
            userLogin = m.get("login")
            activeStatus = ("Active User")
            role = m.get("globalRole")
            tproject = m.get("tprojectRoles")    
            print("Login: " + userLogin + " " + lastname + " " + firstname + " Role: " + str(role['name']) + " " + str(activeStatus))
            df.loc[len(df.index)] =[userLogin, lastname, str(role['name']), str(activeStatus)]
            for row in dataframe_to_rows(df, index = False):
                ws.append(row)          
        else:
            inactive = (str(m.get("firstName")) + " " + str(m.get("lastName")) +": User is not Active")
            print(inactive)
    wb.save(filename = 'userData.xlsx')

The output in excel is this: Login = A1, User Name = B1, Role = C1, Status = D1

  1. Login User Name Role Status
  2. admin Administrator Admin Active
  3. Login User Name Role Status
  4. admin Administrator Admin Active
  5. user1 Pedro leader Active
  6. Login User Name Role Status
  7. admin Administrator Admin Active
  8. user1 Pedro leader Active
  9. user2 Juan leader Active

Also, for my else loop for inactive users, is it possible to append them in the same excel file to another sheet? Thank you all

Hi to @Redox and @taipei thank you for your quick responses and answers, I have resolve my duplication issues in a different format:)

def getUserDetail():    
tree = ET.parse('users.xml')
root = tree.getroot()
#create excel
workbook = Workbook()
ws = workbook.active
ws.title = ("Active Users")
ws.append(['Login', 'User Name', 'Role', 'Status'])
#logins = []
for user in root.findall('user'):
    login = user.find('login').text
#    logins.append(login)
# for index in range(10):
#     login = logins[index]
    for m in tls.getUserByLogin(login):
        user_status = int(m.get("isActive"))
        if user_status == 1:
            lastname = m.get("lastName")
            firstname = m.get("firstName")
            userLogin = m.get("login")
            activeStatus = ("Active User")
            role = m.get("globalRole")
            tproject = m.get("tprojectRoles")    
            print("Login: " + userLogin + " " + lastname + " " + firstname + " Role: " + str(role['name']) + " " + str(activeStatus))
            data = [[userLogin, lastname + firstname, str(role['name']), str(activeStatus)]]
            for row in data:
                ws.append(row)
        else:
            inactive = (str(m.get("firstName")) + " " + str(m.get("lastName")) +": User is not Active")
            print(inactive)
### MOVED code here - note it should be outside ALL for loops ####             
workbook.save(filename = 'userData.xlsx')

getUserDetail()

The ws.append() and ws.save should be outside of the ALL for loops, including the first one. Updated code here.


tree = ET.parse('users.xml')
root = tree.getroot()
#create excel
wb = Workbook()
ws = wb.active
ws.title = ("Active Users")
df=pd.DataFrame(columns=["Login", "User Name", "Role", "Status"])
for user in root.findall('user'):
    login = user.find('login').text
    for m in tls.getUserByLogin(login):
        user_status = int(m.get("isActive"))
        
        if user_status == 1:
            lastname = m.get("lastName")
            firstname = m.get("firstName")
            userLogin = m.get("login")
            activeStatus = ("Active User")
            role = m.get("globalRole")
            tproject = m.get("tprojectRoles")    
            print("Login: " + userLogin + " " + lastname + " " + firstname + " Role: " + str(role['name']) + " " + str(activeStatus))
            df.loc[len(df.index)] =[userLogin, lastname, str(role['name']), str(activeStatus)]
        else:
            inactive = (str(m.get("firstName")) + " " + str(m.get("lastName")) +": User is not Active")
            print(inactive)

### MOVED code here - note it should be outside ALL for loops ####
for row in dataframe_to_rows(df, index = False):
    ws.append(row)          

wb.save(filename = 'userData.xlsx')

Are you sure that users.xml only contains a unique user?

If you're not sure, I think it's better to check existing user logic.

to achieve that you can use a dictionary or array to temporary store your user in a loop and check if the current user was exists

. . .
user_tmp = []
for user in root.findall('user'):
    login = user.find('login').text
    # Check if login is in the list
    if login not in user_tmp:
        user_tmp.append(login)
    else:
        # if login is in the list, continue the loop
        continue
 . . .

since you are using the Pandas data frame, you can generate multiple sheets when saving the data frame with toExcel

# Example, you generate an active user in df_active and inactive user in # create a excel writer object
with pd.ExcelWriter("path to file\filename.xlsx") as writer:
    # use to_excel function and specify the sheet_name and index
    # to store the dataframe in specified sheet
    df_active.to_excel(writer, sheet_name="Active", index=False)
    df_inactive.to_excel(writer, sheet_name="Inactive", index=False)

I hope you can get hints to solve your issues from my suggestions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM