比較python pandas中兩列的值/字符串

Question

Python Pandas：我想比較Excel中兩列中的值/字符串，並根據給定的條件在新列中返回字符串/值。 我嘗試下面的代碼..但輸出比實際數組更長。

有人可以幫我解決一下嗎

Resource = []
for x in df['Category']:
    for y in df['Service_Line']:
         if x=='low space'and y=='Intel':
            Resource.append('Rhythm')
         elif x=='log space' and y=='Intel':
            Resource.append('Blue')
         elif x=='CPU usage' and y=='Intel':
            Resource.append('Jazz')
        else:
            Resource.append('Other')
print('Resource')
df['Resource'] = Resource
print(df)

樣本數據

d = {'Category': {0: 'low space',1: 'CPU usage',2: 'log space',3: 'low volume',4: 'CPU usage',5: 'low volume',6: 'CPU usage',7: 'log space',8: 'log spac',9: 'other',10: 'other',11: 'Low space'},
 'Service_Line': {0: 'Intel',1: 'SQL',2: 'Intel',3: 'BUR',4: 'AIX',5: 'BUR',
  6: 'Intel',7: 'SQL',8: 'AIX',9:'SAN',10: 'SAN',11: 'SQL'},     
 'summary_data': {0: 'low space in server123',1: 'Server213f3 CPU usage',     2: 'getting more data in log space',3: 'low volume space in server',4: 'high CPU usage by application',5: 'low volume space in server',6: 'high CPU usage by application',7: 'getting more data in log space',8: 'getting more data in log space',9: 'space in server123',10: 'space in server123',11: np.nan}}

df = pd.DataFrame(d)

      Category Service_Line                    summary_data
0    low space        Intel          low space in server123
1    CPU usage          SQL           Server213f3 CPU usage
2    log space        Intel  getting more data in log space
3   low volume          BUR      low volume space in server
4    CPU usage          AIX   high CPU usage by application
5   low volume          BUR      low volume space in server
6    CPU usage        Intel   high CPU usage by application
7    log space          SQL  getting more data in log space
8     log spac          AIX  getting more data in log space
9        other          SAN              space in server123
10       other          SAN              space in server123
11   Low space          SQL                             NaN

Answer 1

 Resource = []
 for i, x in enumerate(df['Category']):
     y = df['Service_Line'][ i ]
     if x=='low space'and y=='Intel':
        Resource.append('Rhythm')
     elif x=='log space' and y=='Intel':
        Resource.append('Blue')
     elif x=='CPU usage' and y=='Intel':
        Resource.append('Jazz')
     else:
        Resource.append('Other')
  print('Resource')
  df['Resource'] = Resource
  print(df)

這應該可以使IIUC正常工作。

您的代碼的問題是它在Resources中生成N * N個值，因為對於每個x，它將獲得N個Y值，並且您將該值放入Resources中。

您也可以使用df.index代替枚舉為

for i in df.index:
    x = df['Category'][ i ]
    y = df['Service_Line'][ i ]
         if x=='low space'and y=='Intel':
            Resource.append('Rhythm')
         elif x=='log space' and y=='Intel':
            Resource.append('Blue')
         elif x=='CPU usage' and y=='Intel':
            Resource.append('Jazz')
         else:
            Resource.append('Other')
print('Resource')
df['Resource'] = Resource
print(df)

Answer 2

在列表中定義所有條件

conditions = [((df.Category == 'low space') & (df.Service_Line == 'Intel')),
              ((df.Category == 'log space') & (df.Service_Line == 'Intel')),
              ((df.Category == 'CPU usage') & (df.Service_Line == 'Intel'))]

然后使用numpy中的select

import numpy as np
df['Resource'] = np.select(conditions,['Rhythm','Blue','Jazz'],default='Other')



 Service_Line                    summary_data    Category   Resource
0         Intel          low space in server123   low space   Rhythm
1           SQL           Server213f3 CPU usage   CPU usage    Other
2         Intel  getting more data in log space   log space     Blue
3           BUR      low volume space in server  low volume    Other
4           AIX   high CPU usage by application   CPU usage    Other
5           BUR      low volume space in server  low volume    Other
6         Intel   high CPU usage by application   CPU usage     Jazz
7           SQL  getting more data in log space   log space    Other
8           AIX  getting more data in log space    log spac    Other
9           SAN              space in server123       other    Other
10          SAN              space in server123       other    Other
11          SQL                             NaN   Low space    Other

比較python pandas中兩列的值/字符串

問題描述

2 個解決方案

解決方案1
0 2018-03-02 19:13:33

解決方案2
0 2018-03-02 19:57:11

比較python pandas中兩列的值/字符串

問題描述

2 個解決方案

解決方案1 0 2018-03-02 19:13:33

解決方案2 0 2018-03-02 19:57:11

解決方案1
0 2018-03-02 19:13:33

解決方案2
0 2018-03-02 19:57:11