简体   繁体   中英

Compare values/strings in two columns in python pandas

Python Pandas: I want to compare values/strings in two columns in an excel and return a string/value in new column based on a condition given. i tried this below code.. but the output is lengthier than the actual array..

could someone help me to sort it out

Resource = []
for x in df['Category']:
    for y in df['Service_Line']:
         if x=='low space'and y=='Intel':
            Resource.append('Rhythm')
         elif x=='log space' and y=='Intel':
            Resource.append('Blue')
         elif x=='CPU usage' and y=='Intel':
            Resource.append('Jazz')
        else:
            Resource.append('Other')
print('Resource')
df['Resource'] = Resource
print(df)

Sample Data

d = {'Category': {0: 'low space',1: 'CPU usage',2: 'log space',3: 'low volume',4: 'CPU usage',5: 'low volume',6: 'CPU usage',7: 'log space',8: 'log spac',9: 'other',10: 'other',11: 'Low space'},
 'Service_Line': {0: 'Intel',1: 'SQL',2: 'Intel',3: 'BUR',4: 'AIX',5: 'BUR',
  6: 'Intel',7: 'SQL',8: 'AIX',9:'SAN',10: 'SAN',11: 'SQL'},     
 'summary_data': {0: 'low space in server123',1: 'Server213f3 CPU usage',     2: 'getting more data in log space',3: 'low volume space in server',4: 'high CPU usage by application',5: 'low volume space in server',6: 'high CPU usage by application',7: 'getting more data in log space',8: 'getting more data in log space',9: 'space in server123',10: 'space in server123',11: np.nan}}

df = pd.DataFrame(d)

      Category Service_Line                    summary_data
0    low space        Intel          low space in server123
1    CPU usage          SQL           Server213f3 CPU usage
2    log space        Intel  getting more data in log space
3   low volume          BUR      low volume space in server
4    CPU usage          AIX   high CPU usage by application
5   low volume          BUR      low volume space in server
6    CPU usage        Intel   high CPU usage by application
7    log space          SQL  getting more data in log space
8     log spac          AIX  getting more data in log space
9        other          SAN              space in server123
10       other          SAN              space in server123
11   Low space          SQL                             NaN
 Resource = []
 for i, x in enumerate(df['Category']):
     y = df['Service_Line'][ i ]
     if x=='low space'and y=='Intel':
        Resource.append('Rhythm')
     elif x=='log space' and y=='Intel':
        Resource.append('Blue')
     elif x=='CPU usage' and y=='Intel':
        Resource.append('Jazz')
     else:
        Resource.append('Other')
  print('Resource')
  df['Resource'] = Resource
  print(df)

This should work IIUC your problem.

The problem with your code is its generating N*N values in Resources since for each x it will get N number of Y and you're putting the value in Resources.

You can also use df.index instead of enumerating as

for i in df.index:
    x = df['Category'][ i ]
    y = df['Service_Line'][ i ]
         if x=='low space'and y=='Intel':
            Resource.append('Rhythm')
         elif x=='log space' and y=='Intel':
            Resource.append('Blue')
         elif x=='CPU usage' and y=='Intel':
            Resource.append('Jazz')
         else:
            Resource.append('Other')
print('Resource')
df['Resource'] = Resource
print(df)

Define all of your conditions in a list

conditions = [((df.Category == 'low space') & (df.Service_Line == 'Intel')),
              ((df.Category == 'log space') & (df.Service_Line == 'Intel')),
              ((df.Category == 'CPU usage') & (df.Service_Line == 'Intel'))]

then use select from numpy

import numpy as np
df['Resource'] = np.select(conditions,['Rhythm','Blue','Jazz'],default='Other')



 Service_Line                    summary_data    Category   Resource
0         Intel          low space in server123   low space   Rhythm
1           SQL           Server213f3 CPU usage   CPU usage    Other
2         Intel  getting more data in log space   log space     Blue
3           BUR      low volume space in server  low volume    Other
4           AIX   high CPU usage by application   CPU usage    Other
5           BUR      low volume space in server  low volume    Other
6         Intel   high CPU usage by application   CPU usage     Jazz
7           SQL  getting more data in log space   log space    Other
8           AIX  getting more data in log space    log spac    Other
9           SAN              space in server123       other    Other
10          SAN              space in server123       other    Other
11          SQL                             NaN   Low space    Other

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM