[英]How do I find the maximum value in an array within a dataframe column?
I have a dataframe (df) that looks like this: 我有一个如下所示的数据帧(df):
a b
loc.1 [1, 2, 3, 4, 7, 5, 6]
loc.2 [3, 4, 3, 7, 7, 8, 6]
loc.3 [1, 4, 3, 1, 7, 8, 6]
...
I want to find the maximum of the array in column b and append this to the original data frame. 我想在列b中找到数组的最大值,并将其附加到原始数据帧。 My thought was something like this:
我的想法是这样的:
for line in df:
split = map(float,b.split(','))
count_max = max(split)
print count
Ideal output should be: 理想的输出应该是:
a b max_val
loc.1 [1, 2, 3, 4, 7, 5, 6] 7
loc.2 [3, 4, 3, 7, 7, 8, 6] 8
loc.3 [1, 4, 3, 1, 7, 8, 6] 8
...
But this does not work, as I cannot use b.split as it is not defined... 但这不起作用,因为我不能使用b.split,因为它没有定义...
If working with lists without NaN
s best is use max
in list comprehension or map
: 如果使用没有
NaN
的列表最好是在列表理解或map
使用max
:
a['max'] = [max(x) for x in a['b']]
a['max'] = list(map(max, a['b']))
Pure pandas solution: 纯熊猫解决方案:
a['max'] = pd.DataFrame(a['b'].values.tolist()).max(axis=1)
Sample : 样品 :
array = {'loc.1': np.array([ 1,2,3,4,7,5,6]),
'loc.2': np.array([ 3,4,3,7,7,8,6]),
'loc.3': np.array([ 1,4,3,1,7,8,6])}
L = [(k, v) for k, v in array.items()]
a = pd.DataFrame(L, columns=['a','b']).set_index('a')
a['max'] = [max(x) for x in a['b']]
print (a)
b max
a
loc.1 [1, 2, 3, 4, 7, 5, 6] 7
loc.2 [3, 4, 3, 7, 7, 8, 6] 8
loc.3 [1, 4, 3, 1, 7, 8, 6] 8
EDIT: 编辑:
You can also get max
in list comprehension
: 您还可以获得
list comprehension
max
:
L = [(k, v, max(v)) for k, v in array.items()]
a = pd.DataFrame(L, columns=['a','b', 'max']).set_index('a')
print (a)
b max
a
loc.1 [1, 2, 3, 4, 7, 5, 6] 7
loc.2 [3, 4, 3, 7, 7, 8, 6] 8
loc.3 [1, 4, 3, 1, 7, 8, 6] 8
You can use numpy
arrays for a vectorised calculation: 您可以使用
numpy
数组进行矢量化计算:
df = pd.DataFrame({'a': ['loc.1', 'loc.2', 'loc.3'],
'b': [[1, 2, 3, 4, 7, 5, 6],
[3, 4, 3, 7, 7, 8, 6],
[1, 4, 3, 1, 7, 8, 6]]})
df['maxval'] = np.array(df['b'].values.tolist()).max(axis=1)
print(df)
# a b maxval
# 0 loc.1 [1, 2, 3, 4, 7, 5, 6] 7
# 1 loc.2 [3, 4, 3, 7, 7, 8, 6] 8
# 2 loc.3 [1, 4, 3, 1, 7, 8, 6] 8
尝试这个:
df["max_val"] = df["b"].apply(lambda x:max(x))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.