![](/img/trans.png)
[英]How to iterate DataFrame of strings and apply if condtion over the result
[英]How to apply if condtion and apply to dataframe
需要使用bool_res
和bool_2
res 的鍵is_doc1
檢查bool3_res
的'detected'
鍵
如果bool3_res['detected'] == bool1_res['is_doc1'] == True
那么我的resp
必須返回
如果bool3_res['detected'] == bool2_res['is_doc1'] == True
那么我的resp
必須返回\\
3:否則返回“無效”
數據框
user_uid,bool1,bool2,bool3,bool1_res,bool2_res,bool3_res
1001,27452.webp,981.webp,d92e.webp,"{'is_doc1': False, 'is_doc2': True}","{'is_doc1': True, 'is_doc2': True}","{'detected': True, 'count': 1}"
1002,27452.webp,981.webp,d92e.webp,"{'is_doc1': True, 'is_doc2': True}","{'is_doc1': False, 'is_doc2': True}","{'detected': True, 'count': 1}"
我的代碼
def new_func(x):
d1 = df['bool1_res'].to_dict()
d1 = eval(d1[0])
d2 = df['bool2_res'].to_dict()
d2 = eval(d2[0])
d3 = df['bool3_res'].to_dict()
d3 = eval(d3[0])
if d1['is_doc1'] == d3['detected'] == True:
resp = {
"task_id": "uid",
"group_id": "uid",
"data": {
"document1": df['bool1'],
"document2": df['bool3']
}
}
elif d2['is_doc1'] == d3['detected'] == True:
resp = {
"task_id": "user_uid",
"group_id": "uid",
"data": {
"document1": df['bool2'],
"document2": df['bool3']
}
}
elif d3['detected'] == False:
resp = 'Not valid'
else:
resp = 'Not valid'
return resp
df['new'] = df.apply(new_func, axis = 1)
#df['new'] = df[['bool1', 'bool2', 'bool3', 'bool1_res', 'bool2_res', 'bool3_res']].applymap(new_func)
我的預期
df['新']
{'u_id': 'uid', 'group': 'uid', 'data': {'document1': ['981.webp'], 'document2': {'d92e.webp'}}}"
{'u_id': 'uid', 'group': 'uid', 'data': {'document1': ['27452.webp'], 'document2': {'d92e.webp'}}}"
我的 df['new']
0 {'task_id': 'user_uid', 'group_id': 'uid', 'data': {'document1': ['981.webp', '981.webp'], 'document2': ['d92e.webp', 'd92e.webp']}}
1 {'task_id': 'user_uid', 'group_id': 'uid', 'data': {'document1': ['981.webp', '981.webp'], 'document2': ['d92e.webp', 'd92e.webp']}}
Name: new, dtype: object
您應該避免使用eval
,而是使用ast.literal_eval
和x
而不是df
來處理每行,並且對於一個元素列表,將[]
添加到x['bool1']
、 x['bool2']
和x['bool3']
:
import ast
def new_func(x):
d1 = ast.literal_eval(x['bool1_res'])
d2 = ast.literal_eval(x['bool2_res'])
d3 = ast.literal_eval(x['bool3_res'])
if d1['is_doc1'] == d3['detected'] == True:
resp = {
"task_id": "uid",
"group_id": "uid",
"data": {
"document1": [x['bool1']],
"document2": [x['bool3']]
}
}
elif d2['is_doc1'] == d3['detected'] == True:
resp = {
"task_id": "user_uid",
"group_id": "uid",
"data": {
"document1": [x['bool2']],
"document2": [x['bool3']]
}
}
elif d3['detected'] == False:
resp = 'Not valid'
else:
resp = 'Not valid'
return resp
df['new'] = df.apply(new_func, axis = 1)
print (df['new'].iat[0])
{'task_id': 'user_uid', 'group_id': 'uid', 'data': {'document1': ['981.webp'], 'document2': ['d92e.webp']}}
print (df['new'].iat[1])
{'task_id': 'uid', 'group_id': 'uid', 'data': {'document1': ['27452.webp'], 'document2': ['d92e.webp']}}
我假設這是擴展代碼行后數據的樣子:(此外,如果您甚至可以添加一些空格,閱讀起來會容易得多......^_^)
df = pd.DataFrame(
[
[1001, "27452.webp", "981.webp", "d92e.webp",
"{'is_doc1': False, 'is_doc2': True}",
"{'is_doc1': True, 'is_doc2': True}",
"{'detected': True, 'count': 1}"
],
[1002, "27452.webp", "981.webp", "d92e.webp",
"{'is_doc1': True, 'is_doc2': True}",
"{'is_doc1': False, 'is_doc2': True}",
"{'detected': True, 'count': 1}"
],
[1003, "27452.webp", "981.webp", "d92e.webp",
"{'is_doc1': True, 'is_doc2': True}",
"{'is_doc1': False, 'is_doc2': True}",
"{'detected': False, 'count': 1}"
],
],
columns=['user_uid', 'bool1', 'bool2', 'bool3', 'bool1_res', 'bool2_res',
'bool3_res'
]
)
執行分為兩部分:(1)解析字符串和(2)處理/制作“新”列值。
# required packages
import ast
import pandas as pd
# for type suggestions
from typing import Any
此函數通過pd.DataFrame.applymap應用於數據幀中的每個元素,並使用ast.literal_eval
,正如@jezrael 正確建議的那樣。
def str2dict(x: Any):
"""(Step 1) Parses argument using ast.literal_eval"""
try:
x = ast.literal_eval(x.strip())
# if x is not parsable, return x as-is
except ValueError as e:
pass
finally:
return x
此函數應用於數據幀的每一行(通過pd.DataFrame.agg ):
根據您發布的功能中的邏輯,我:
檢查bool3['detected']
是否為 False(您的前兩個條件都已檢測到 == True); 如果是這樣,則引發 ValueError
檢查 is_doc1 對於 bool1 是否為 True,如果不是,對於 bool2
如果 is_doc1 都不為 True,則引發ValueError
def make_newcol_entry(x: pd.Series):
"""(Step 2) constructs "new" column value for pandas group"""
try:
if x.bool3_res['detected'] is False:
raise ValueError
# check is_doc1 properties
elif x.bool1_res['is_doc1'] is True:
document1 = x.bool1
elif x.bool2_res['is_doc1'] is True:
document1 = x.bool2
else:
raise ValueError
except ValueError:
entry = "not valid"
pass
# if there is `is_doc1` that is True, construct your entry.
else:
entry = {
"task_id": "uid",
"group_id": "uid",
"data": {"document1": document1, "document2": x.bool3}
}
return entry
df = df.assign(new=lambda x: x.applymap(str2dict) \
.agg(make_newcol_entry, axis=1))
請注意,這會解析數據框中的所有元素。
要僅解析bool_res
列,您可以分兩步執行:
# select and parse only res cols ('bool#_res'), then apply
df.update(df.filter(regex=r'_res$', axis=1).applymap(str2dict))
df = df.assign(lambda x: x.agg(apply_make_newcol_entry, axis=1))
$ df
user_uid bool1 bool2 bool3 bool1_res bool2_res bool3_res new
0 1001 27452.webp 981.webp d92e.webp {'is_doc1': False, 'is_doc2': True} {'is_doc1': True, 'is_doc2': True} {'detected': True, 'count': 1} {'task_id': 'uid', 'group_id': 'uid', 'data': {'document1': '981.webp', 'document2': 'd92e.webp'}}
1 1002 27452.webp 981.webp d92e.webp {'is_doc1': True, 'is_doc2': True} {'is_doc1': False, 'is_doc2': True} {'detected': True, 'count': 1} {'task_id': 'uid', 'group_id': 'uid', 'data': {'document1': '27452.webp', 'document2': 'd92e.webp'}}
2 1003 27452.webp 981.webp d92e.webp {'is_doc1': True, 'is_doc2': True} {'is_doc1': False, 'is_doc2': True} {'detected': False, 'count': 1} not valid
$ df['new']
0 {'task_id': 'uid', 'group_id': 'uid', 'data': {'document1': '981.webp', 'document2': 'd92e.webp'}}
1 {'task_id': 'uid', 'group_id': 'uid', 'data': {'document1': '27452.webp', 'document2': 'd92e.webp'}}
2 not valid
Name: new, dtype: object
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.