![](/img/trans.png)
[英]Create new Python DataFrame column based on conditions of multiple other columns
[英]How to create new columns based on multiple conditions in other columns using a for loop?
我正在嘗試編寫一個for循環,該循環使用布爾值創建新列,該布爾值指示所引用的兩個列是否都包含True值。 我希望此循環遍歷現有列並進行比較,但是我不確定如何獲得該循環。 到目前為止,我一直在嘗試使用引用不同列的列表。 代碼如下:
import pandas as pd
import numpy as np
elig = pd.read_excel('spreadsheet.xlsx')
elig['ELA'] = elig['SELECTED_EXAMS'].str.match('.*English Language Arts.*')
elig['LivEnv'] = elig['SELECTED_EXAMS'].str.match('.*Living Environment.*')
elig['USHist'] = elig['SELECTED_EXAMS'].str.match('.*US History.*')
elig['Geometry'] = elig['SELECTED_EXAMS'].str.match('.*Geometry.*')
elig['AlgebraI'] = elig['SELECTED_EXAMS'].str.match('.*Algebra I.*')
elig['GlobalHistory'] = elig['SELECTED_EXAMS'].str.match('.*Global History.*')
elig['Physics'] = elig['SELECTED_EXAMS'].str.match('.*Physics.*')
elig['AlgebraII'] = elig['SELECTED_EXAMS'].str.match('.*Algebra II.*')
elig['EarthScience'] = elig['SELECTED_EXAMS'].str.match('.*Earth Science.*')
elig['Chemistry'] = elig['SELECTED_EXAMS'].str.match('.*Chemistry.*')
elig['LOTE Spanish'] = elig['SELECTED_EXAMS'].str.match('.*LOTE – Spanish.*')
# CHANGE TO LOOP--enter columns for instances in which scorers overlap competencies (e.g. can score two different exams). This is helpful in the event that two exams are scored on the same day, and we need to resolve numbers of scorers.
exam_list = ['ELA','LiveEnv','USHist','Geometry','AlgebraI','GlobalHistory','Physics','AlgebraII','EarthScience','Chemistry','LOTE Spanish']
nestedExam_list = ['ELA','LiveEnv','USHist','Geometry','AlgebraI','GlobalHistory','Physics','AlgebraII','EarthScience','Chemistry','LOTE Spanish']
for exam in exam_list:
for nestedExam in nestedExam_list:
elig[exam+nestedExam+' Overlap'] = np.where((elig[exam]==True)&(elig[nestedExam]==True,),True,False)
我認為問題出在np.where()上,我想要的是考試和nestedExam調用有問題的列,但它們只是調用列表項。 錯誤消息如下:
ValueError Traceback (most recent call last)
<ipython-input-33-9347975b8865> in <module>
3 for exam in exam_list:
4 for nestedExam in nestedExam_list:
----> 5 elig[exam+nestedExam+' Overlap'] = np.where((elig[exam]==True)&(elig[nestedExam]==True,),True,False)
6
7 """
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(self, other)
1359
1360 res_values = na_op(self.values, other)
-> 1361 unfilled = self._constructor(res_values, index=self.index)
1362 return filler(unfilled).__finalize__(self)
1363
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
260 'Length of passed values is {val}, '
261 'index implies {ind}'
--> 262 .format(val=len(data), ind=len(index)))
263 except TypeError:
264 pass
ValueError: Length of passed values is 1, index implies 26834
有人可以幫我這個忙嗎?
首先,為了更有效地進行組合,並且不進行重復計算,我建議您使用內置庫itertools 。
`import itertools
exam_list = ['A', 'B', 'C', 'D']
for exam1, exam2 in itertools.combinations(exam_list, 2):
print(exam1 + '_' + exam2)
A_B
A_C
A_D
B_C
B_D
C_D
如果您實際上需要所有可能的訂單/組合,則可以將permutations
為combinations
為了解決實際的問題,您實際上需要少得多的代碼來執行所需的操作。 如果您有兩列elig[exam1]
和elig[exam2]
都是布爾數組,那么兩個都為true的數組是(elig[exam1] & elig[exam2])
。 這稱為“按位”或“邏輯與”運算。
例如:
df = pd.DataFrame({'A': ['car', 'cat', 'hat']})
df['start=c'] = df['A'].str.startswith('c')
df['end=t'] = df['A'].str.endswith('t')
df['both'] = df['start=c'] & df['end=t']
A start=c end=t both
0 car True False False
1 cat True True True
2 hat False True False
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.