I have a dataframe that contains a string column with several different 4 character that can be separated by |
or &
, but not always. I am trying to map a dictionary to each discrete 4 character item but am running into issues. pandas ver 23.4
The basic code I am trying to use:
df = df.replace(dict, regex=True)
or if trying to select a specific col:
df['Col'] = df['Col'].replace(dict, regex=True)
Both raise the following error:
ValueError: The truth value of an array with more that one element is ambiguous. Use a.any() or a.all()
The values of the dictionary are type list
. Is this something that would be an issue with performing the .replace
?
Update With Sample df and dict
ID Code
ABCD 00FQ
JKFA 8LK9|4F5H
QWST 2RLA|R1T5&8LK9
dict={'00FQ':['A','B'], '8LK9':['X'], '4F5H':['U','Z'], '2RLA':['H','K'], 'R1T5':['B','G'] }
The dict will have more elements in it than in the dataframe.
Update with expected output
ID Code Logic
ABCD 00FQ ['A','B']
JKFA 8LK9|4F5H ['X'] | ['U','Z']
QWST 2RLA|R1T5&8LK9 ['H','K'] | ['B','G'] & ['X']
The overall goal is to perform this replace on two dataframes, and then compare the ID's
on both sides for equivalence.
The regex defined in your dict might be matching with more than one rows of the dataframe, and python is confused about which replacement value to take from the dict.
And, when a numpy array is checked for its boolean value, this Error is forced to save users from guessing. Would you consider an array of elements to be True if
Thus it throws this error to allow the programmer to explicitly mention it.
Go Here for more clarification.
Here's a function which will allow you to parse relevant values from your strings:
def string_to_list(string):
"""
parses a parent string for 4 character children strings
returns a list of children strings
"""
# instantiate values
child = ''
children = []
if len(string)<4:
return None
for n in string:
# skip if not wanted
if n in ['|','&']:
continue
child+=n
if len(child)==4:
children.append(child)
child = ''
# finished
return children
Apply it to extract a list of values as follows:
df['Code_List'] = df['Code'].apply(string_to_list)
Map to relevant logic
values:
# Instantiate the dictionary of logic rules
logic_dict = {'00FQ':['A','B'], '8LK9':['X'], '4F5H':['U','Z'], '2RLA':['H','K'], 'R1T5':['B','G'] }
# Map the logic rules
df['Logic_List'] = df['Code_List'].apply(lambda arr: [logic_dict[x] for x in arr])
# Final output
ID Code Code_List Logic_List
0 ABCD 00FQ [00FQ] [[A, B]]
1 JKFA 8LK9|4F5H [8LK9, 4F5H] [[X], [U, Z]]
2 QWST 2RLA|R1T5&8LK9 [2RLA, R1T5, 8LK9] [[H, K], [B, G], [X]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.