Python Pandas: I want to compare values/strings in two columns in an excel and return a string/value in new column based on a condition given. i tried this below code.. but the output is lengthier than the actual array..
could someone help me to sort it out
Resource = []
for x in df['Category']:
for y in df['Service_Line']:
if x=='low space'and y=='Intel':
Resource.append('Rhythm')
elif x=='log space' and y=='Intel':
Resource.append('Blue')
elif x=='CPU usage' and y=='Intel':
Resource.append('Jazz')
else:
Resource.append('Other')
print('Resource')
df['Resource'] = Resource
print(df)
Sample Data
d = {'Category': {0: 'low space',1: 'CPU usage',2: 'log space',3: 'low volume',4: 'CPU usage',5: 'low volume',6: 'CPU usage',7: 'log space',8: 'log spac',9: 'other',10: 'other',11: 'Low space'},
'Service_Line': {0: 'Intel',1: 'SQL',2: 'Intel',3: 'BUR',4: 'AIX',5: 'BUR',
6: 'Intel',7: 'SQL',8: 'AIX',9:'SAN',10: 'SAN',11: 'SQL'},
'summary_data': {0: 'low space in server123',1: 'Server213f3 CPU usage', 2: 'getting more data in log space',3: 'low volume space in server',4: 'high CPU usage by application',5: 'low volume space in server',6: 'high CPU usage by application',7: 'getting more data in log space',8: 'getting more data in log space',9: 'space in server123',10: 'space in server123',11: np.nan}}
df = pd.DataFrame(d)
Category Service_Line summary_data
0 low space Intel low space in server123
1 CPU usage SQL Server213f3 CPU usage
2 log space Intel getting more data in log space
3 low volume BUR low volume space in server
4 CPU usage AIX high CPU usage by application
5 low volume BUR low volume space in server
6 CPU usage Intel high CPU usage by application
7 log space SQL getting more data in log space
8 log spac AIX getting more data in log space
9 other SAN space in server123
10 other SAN space in server123
11 Low space SQL NaN
Resource = []
for i, x in enumerate(df['Category']):
y = df['Service_Line'][ i ]
if x=='low space'and y=='Intel':
Resource.append('Rhythm')
elif x=='log space' and y=='Intel':
Resource.append('Blue')
elif x=='CPU usage' and y=='Intel':
Resource.append('Jazz')
else:
Resource.append('Other')
print('Resource')
df['Resource'] = Resource
print(df)
This should work IIUC your problem.
The problem with your code is its generating N*N values in Resources since for each x it will get N number of Y and you're putting the value in Resources.
You can also use df.index instead of enumerating as
for i in df.index:
x = df['Category'][ i ]
y = df['Service_Line'][ i ]
if x=='low space'and y=='Intel':
Resource.append('Rhythm')
elif x=='log space' and y=='Intel':
Resource.append('Blue')
elif x=='CPU usage' and y=='Intel':
Resource.append('Jazz')
else:
Resource.append('Other')
print('Resource')
df['Resource'] = Resource
print(df)
Define all of your conditions in a list
conditions = [((df.Category == 'low space') & (df.Service_Line == 'Intel')),
((df.Category == 'log space') & (df.Service_Line == 'Intel')),
((df.Category == 'CPU usage') & (df.Service_Line == 'Intel'))]
then use select
from numpy
import numpy as np
df['Resource'] = np.select(conditions,['Rhythm','Blue','Jazz'],default='Other')
Service_Line summary_data Category Resource
0 Intel low space in server123 low space Rhythm
1 SQL Server213f3 CPU usage CPU usage Other
2 Intel getting more data in log space log space Blue
3 BUR low volume space in server low volume Other
4 AIX high CPU usage by application CPU usage Other
5 BUR low volume space in server low volume Other
6 Intel high CPU usage by application CPU usage Jazz
7 SQL getting more data in log space log space Other
8 AIX getting more data in log space log spac Other
9 SAN space in server123 other Other
10 SAN space in server123 other Other
11 SQL NaN Low space Other
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.