簡體   English   中英

如何使用for循環基於其他列中的多個條件創建新列?

[英]How to create new columns based on multiple conditions in other columns using a for loop?

我正在嘗試編寫一個for循環,該循環使用布爾值創建新列,該布爾值指示所引用的兩個列是否都包含True值。 我希望此循環遍歷現有列並進行比較,但是我不確定如何獲得該循環。 到目前為止,我一直在嘗試使用引用不同列的列表。 代碼如下:

import pandas as pd
import numpy as np

elig = pd.read_excel('spreadsheet.xlsx')

elig['ELA'] = elig['SELECTED_EXAMS'].str.match('.*English Language Arts.*')
elig['LivEnv'] = elig['SELECTED_EXAMS'].str.match('.*Living Environment.*')
elig['USHist'] = elig['SELECTED_EXAMS'].str.match('.*US History.*')
elig['Geometry'] = elig['SELECTED_EXAMS'].str.match('.*Geometry.*')
elig['AlgebraI'] = elig['SELECTED_EXAMS'].str.match('.*Algebra I.*')
elig['GlobalHistory'] = elig['SELECTED_EXAMS'].str.match('.*Global History.*')
elig['Physics'] = elig['SELECTED_EXAMS'].str.match('.*Physics.*')
elig['AlgebraII'] = elig['SELECTED_EXAMS'].str.match('.*Algebra II.*')
elig['EarthScience'] = elig['SELECTED_EXAMS'].str.match('.*Earth Science.*')
elig['Chemistry'] = elig['SELECTED_EXAMS'].str.match('.*Chemistry.*')
elig['LOTE Spanish'] = elig['SELECTED_EXAMS'].str.match('.*LOTE – Spanish.*')

# CHANGE TO LOOP--enter columns for instances in which scorers overlap competencies (e.g. can score two different exams). This is helpful in the event that two exams are scored on the same day, and we need to resolve numbers of scorers.

exam_list = ['ELA','LiveEnv','USHist','Geometry','AlgebraI','GlobalHistory','Physics','AlgebraII','EarthScience','Chemistry','LOTE Spanish']
nestedExam_list = ['ELA','LiveEnv','USHist','Geometry','AlgebraI','GlobalHistory','Physics','AlgebraII','EarthScience','Chemistry','LOTE Spanish']

for exam in exam_list:
    for nestedExam in nestedExam_list:
        elig[exam+nestedExam+' Overlap'] = np.where((elig[exam]==True)&(elig[nestedExam]==True,),True,False)

我認為問題出在np.where()上,我想要的是考試和nestedExam調用有問題的列,但它們只是調用列表項。 錯誤消息如下:


ValueError                                Traceback (most recent call last)
<ipython-input-33-9347975b8865> in <module>
      3 for exam in exam_list:
      4     for nestedExam in nestedExam_list:
----> 5         elig[exam+nestedExam+' Overlap'] = np.where((elig[exam]==True)&(elig[nestedExam]==True,),True,False)
      6 
      7 """

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(self, other)
   1359 
   1360             res_values = na_op(self.values, other)
-> 1361             unfilled = self._constructor(res_values, index=self.index)
   1362             return filler(unfilled).__finalize__(self)
   1363 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    260                             'Length of passed values is {val}, '
    261                             'index implies {ind}'
--> 262                             .format(val=len(data), ind=len(index)))
    263                 except TypeError:
    264                     pass

ValueError: Length of passed values is 1, index implies 26834

有人可以幫我這個忙嗎?

首先,為了更有效地進行組合,並且不進行重復計算,我建議您使用內置庫itertools

`import itertools

exam_list = ['A', 'B', 'C', 'D']
for exam1, exam2 in itertools.combinations(exam_list, 2):
    print(exam1 + '_' + exam2)
A_B
A_C
A_D
B_C
B_D
C_D

如果您實際上需要所有可能的訂單/組合,則可以將permutationscombinations

為了解決實際的問題,您實際上需要少得多的代碼來執行所需的操作。 如果您有兩列elig[exam1]elig[exam2]都是布爾數組,那么兩個都為true的數組是(elig[exam1] & elig[exam2]) 這稱為“按位”或“邏輯與”運算。

例如:

df = pd.DataFrame({'A': ['car', 'cat', 'hat']})
df['start=c'] = df['A'].str.startswith('c')
df['end=t'] = df['A'].str.endswith('t')
df['both'] = df['start=c'] & df['end=t']
     A  start=c  end=t   both
0  car     True  False  False
1  cat     True   True   True
2  hat    False   True  False

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM