[英]Pandas dataframe column containing string and list
I have a data frame that contains a column with both strings and lists.我有一个数据框,其中包含一个包含字符串和列表的列。
import pandas as pd
data = {'lanes': ['1',['2','4'],'2','3',['1','2','3']]}
df = pd.DataFrame(data,columns=['lanes'])
df
I need to convert the strings to ints and replace the lists with means of the list elements.我需要将字符串转换为整数并用列表元素替换列表。 So, the output should look like this:因此,output 应如下所示:
data2 = {'lanes': [1,3,2,3,2]}
df2 = pd.DataFrame(data2,columns=['lanes'])
df2
Can anyone give me some direction on how to do this, if you have done something like this before?如果您以前做过类似的事情,谁能给我一些指导如何做到这一点?
Use Series.explode
, convert values to integers and then count mean per duplicated index by mean
:使用Series.explode
,将值转换为整数,然后按 mean 计算每个重复索引的mean
:
df['lanes'] = df['lanes'].explode().astype(int).mean(level=0)
print (df)
lanes
0 1
1 3
2 2
3 3
4 2
If data are not lists, but strings repr of lists use:如果数据不是列表,但列表的字符串 repr 使用:
data = {'lanes': ['1',"['2','4']",'2','3',"['1','2','3']"]}
df = pd.DataFrame(data,columns=['lanes'])
import ast
df['lanes'] = df['lanes'].apply(ast.literal_eval).explode().astype(int).mean(level=0)
print (df)
lanes
0 1
1 3
2 2
3 3
4 2
You can try below snippet as well.您也可以尝试以下代码段。 It uses list comprehension to get the result它使用列表理解来获得结果
import pandas as pd
data = {'lanes': ['1',['2','4'],'2','3',['1','2','3']]}
def mean(lst):
return sum(lst) / len(lst)
data2 = dict()
data2['lanes']= [int(mean(i)) for i in [[int(x) for x in list] for list in data['lanes']]]
df2 = pd.DataFrame(data2,columns=['lanes'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.