[英]Compare values/strings in two columns in python pandas
Python Pandas:我想比較Excel中兩列中的值/字符串,並根據給定的條件在新列中返回字符串/值。 我嘗試下面的代碼..但輸出比實際數組更長。
有人可以幫我解決一下嗎
Resource = []
for x in df['Category']:
for y in df['Service_Line']:
if x=='low space'and y=='Intel':
Resource.append('Rhythm')
elif x=='log space' and y=='Intel':
Resource.append('Blue')
elif x=='CPU usage' and y=='Intel':
Resource.append('Jazz')
else:
Resource.append('Other')
print('Resource')
df['Resource'] = Resource
print(df)
樣本數據
d = {'Category': {0: 'low space',1: 'CPU usage',2: 'log space',3: 'low volume',4: 'CPU usage',5: 'low volume',6: 'CPU usage',7: 'log space',8: 'log spac',9: 'other',10: 'other',11: 'Low space'},
'Service_Line': {0: 'Intel',1: 'SQL',2: 'Intel',3: 'BUR',4: 'AIX',5: 'BUR',
6: 'Intel',7: 'SQL',8: 'AIX',9:'SAN',10: 'SAN',11: 'SQL'},
'summary_data': {0: 'low space in server123',1: 'Server213f3 CPU usage', 2: 'getting more data in log space',3: 'low volume space in server',4: 'high CPU usage by application',5: 'low volume space in server',6: 'high CPU usage by application',7: 'getting more data in log space',8: 'getting more data in log space',9: 'space in server123',10: 'space in server123',11: np.nan}}
df = pd.DataFrame(d)
Category Service_Line summary_data
0 low space Intel low space in server123
1 CPU usage SQL Server213f3 CPU usage
2 log space Intel getting more data in log space
3 low volume BUR low volume space in server
4 CPU usage AIX high CPU usage by application
5 low volume BUR low volume space in server
6 CPU usage Intel high CPU usage by application
7 log space SQL getting more data in log space
8 log spac AIX getting more data in log space
9 other SAN space in server123
10 other SAN space in server123
11 Low space SQL NaN
Resource = []
for i, x in enumerate(df['Category']):
y = df['Service_Line'][ i ]
if x=='low space'and y=='Intel':
Resource.append('Rhythm')
elif x=='log space' and y=='Intel':
Resource.append('Blue')
elif x=='CPU usage' and y=='Intel':
Resource.append('Jazz')
else:
Resource.append('Other')
print('Resource')
df['Resource'] = Resource
print(df)
這應該可以使IIUC正常工作。
您的代碼的問題是它在Resources中生成N * N個值,因為對於每個x,它將獲得N個Y值,並且您將該值放入Resources中。
您也可以使用df.index代替枚舉為
for i in df.index:
x = df['Category'][ i ]
y = df['Service_Line'][ i ]
if x=='low space'and y=='Intel':
Resource.append('Rhythm')
elif x=='log space' and y=='Intel':
Resource.append('Blue')
elif x=='CPU usage' and y=='Intel':
Resource.append('Jazz')
else:
Resource.append('Other')
print('Resource')
df['Resource'] = Resource
print(df)
在列表中定義所有條件
conditions = [((df.Category == 'low space') & (df.Service_Line == 'Intel')),
((df.Category == 'log space') & (df.Service_Line == 'Intel')),
((df.Category == 'CPU usage') & (df.Service_Line == 'Intel'))]
然后使用numpy中的select
import numpy as np
df['Resource'] = np.select(conditions,['Rhythm','Blue','Jazz'],default='Other')
Service_Line summary_data Category Resource
0 Intel low space in server123 low space Rhythm
1 SQL Server213f3 CPU usage CPU usage Other
2 Intel getting more data in log space log space Blue
3 BUR low volume space in server low volume Other
4 AIX high CPU usage by application CPU usage Other
5 BUR low volume space in server low volume Other
6 Intel high CPU usage by application CPU usage Jazz
7 SQL getting more data in log space log space Other
8 AIX getting more data in log space log spac Other
9 SAN space in server123 other Other
10 SAN space in server123 other Other
11 SQL NaN Low space Other
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.