I'm looking for a pandas-friendly way of conducting row-wise operations between two nested string-represented lists in two dataframes. Here is my incomplete attempt:
import pandas as pd
import ast
df1 = pd.DataFrame({'id': [0, 1, 2],
'nested_ls': ["[20, 15, 5]", "[8, 7, 0]", "[124, 23, 43]"]})
df2 = pd.DataFrame({'id': [0, 1, 2],
'nested_ls': ["[10, 3, 2]", "[14, 7, 0]", "[100, 3, 20]"]})
df3 = pd.Dataframe()
# This is something along the lines of what needs to be accomplished but,
# it is evaluating the series versus the row-wise nested lists
df3['nested_ls_diff'] = ast.literal_eval(df1['nested_ls']) - ast.literal_eval(df2['nested_ls'])
# Throws - ValueError: malformed node or string: 0
The desired output would be a dataframe that looks like this:
df3 = pd.DataFrame({'id': [0, 1, 2],
'nested_ls_diff': ["[10, 12, 3]", "[-6, 0, 0]", "[24, 20, 23]"]})
Your code: ast.literal_eval(df1['nested_ls'])
tries to evaluate the string representation of the whole series. It's not what you want to do. Instead, you want:
# this gives you a series of lists
df1['nested_ls'].apply(ast.literal_eval)
or better:
# this gives you a numpy array
pd.eval(df1['nested_ls'])
So this would work for you (though not ideal):
df3 = pd.DataFrame()
df3['nested_ls_diff'] = list(pd.eval(df1['nested_ls']) - pd.eval(df2['nested_ls']))
Note that each cell in df3['nested_ls_diff']
is a list, not a string.
Update we can just do a list comprehension here for the general case:
df3['nested_ls_diff'] = [[a-b for a,b in zip(*xy)]
for xy in zip(pd.eval(df1['nested_ls']),pd.eval(df2['nested_ls']))
]
Due to the nature of data (object dtype), this would perform comparable to the other approach.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.