简体   繁体   English

如何在数据框列中找到数组中的最大值?

[英]How do I find the maximum value in an array within a dataframe column?

I have a dataframe (df) that looks like this: 我有一个如下所示的数据帧(df):

a                      b
loc.1  [1, 2, 3, 4, 7, 5, 6]
loc.2  [3, 4, 3, 7, 7, 8, 6]
loc.3  [1, 4, 3, 1, 7, 8, 6]
...

I want to find the maximum of the array in column b and append this to the original data frame. 我想在列b中找到数组的最大值,并将其附加到原始数据帧。 My thought was something like this: 我的想法是这样的:

for line in df: 
    split = map(float,b.split(','))
    count_max = max(split)
print count

Ideal output should be: 理想的输出应该是:

a                      b           max_val
    loc.1  [1, 2, 3, 4, 7, 5, 6]   7
    loc.2  [3, 4, 3, 7, 7, 8, 6]   8
    loc.3  [1, 4, 3, 1, 7, 8, 6]   8
    ...

But this does not work, as I cannot use b.split as it is not defined... 但这不起作用,因为我不能使用b.split,因为它没有定义...

If working with lists without NaN s best is use max in list comprehension or map : 如果使用没有NaN的列表最好是在列表理解或map使用max

a['max'] = [max(x) for x in a['b']]

a['max'] = list(map(max, a['b']))

Pure pandas solution: 纯熊猫解决方案:

a['max'] = pd.DataFrame(a['b'].values.tolist()).max(axis=1)

Sample : 样品

array = {'loc.1': np.array([  1,2,3,4,7,5,6]),
         'loc.2': np.array([  3,4,3,7,7,8,6]),
         'loc.3': np.array([  1,4,3,1,7,8,6])}

L = [(k, v) for k, v in array.items()]
a = pd.DataFrame(L, columns=['a','b']).set_index('a')

a['max'] = [max(x) for x in a['b']]
print (a)
                           b  max
a                                
loc.1  [1, 2, 3, 4, 7, 5, 6]    7
loc.2  [3, 4, 3, 7, 7, 8, 6]    8
loc.3  [1, 4, 3, 1, 7, 8, 6]    8

EDIT: 编辑:

You can also get max in list comprehension : 您还可以获得list comprehension max

L = [(k, v, max(v)) for k, v in array.items()]
a = pd.DataFrame(L, columns=['a','b', 'max']).set_index('a')

print (a)
                           b  max
a                                
loc.1  [1, 2, 3, 4, 7, 5, 6]    7
loc.2  [3, 4, 3, 7, 7, 8, 6]    8
loc.3  [1, 4, 3, 1, 7, 8, 6]    8

You can use numpy arrays for a vectorised calculation: 您可以使用numpy数组进行矢量化计算:

df = pd.DataFrame({'a': ['loc.1', 'loc.2', 'loc.3'],
                   'b': [[1, 2, 3, 4, 7, 5, 6],
                         [3, 4, 3, 7, 7, 8, 6],
                         [1, 4, 3, 1, 7, 8, 6]]})

df['maxval'] = np.array(df['b'].values.tolist()).max(axis=1)

print(df)

#        a                      b  maxval
# 0  loc.1  [1, 2, 3, 4, 7, 5, 6]       7
# 1  loc.2  [3, 4, 3, 7, 7, 8, 6]       8
# 2  loc.3  [1, 4, 3, 1, 7, 8, 6]       8

尝试这个:

df["max_val"] = df["b"].apply(lambda x:max(x))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在具有另一列最大值的行中的一个 dataframe 列中找到值? - How do I find the value in one dataframe column in the row with the maximum value of another column? 我如何找到:每列中的第一个非NaN值是DataFrame中该列的最大值吗? - How do I find: Is the first non-NaN value in each column the maximum for that column in a DataFrame? 如何在 Pandas 数据框中找到最大值的行和列的索引? - How do I find the indices of the row and the column for the maximum value in a Pandas dataframe? 如何在python数据帧中查找列的最大值 - How to find maximum value of a column in python dataframe 如何在数据框列中找到特定的表达式? - How Do I Find a Specific Expression within a Dataframe Column? 如何在 Pandas 数据框列中找到已知值的索引? - How do I find the index of a known value in a pandas dataframe column? 如何在 Pandas 中的串联 dataframe 中找到列/行组合的最大值 - How to find the maximum value of a column/row combination in a concatenated dataframe in Pandas 如何在 dataframe 的 A 列中找到 B 列中的 dataframe 值,如果是,将 B 列中的值替换为 A 列的值? - How do I find in dataframe value in column B exists in Column A in a dataframe, and if so, replace the value in column B with Column A's value? 如何在Pandas的数据框子集中获得最大值? - How do I get the maximum within a subset of my dataframe in Pandas? 如何在特定列中找到最大值并在数据框的另一列中返回相应值? - How do I find the maximum value in specific columns and return the corresponding value in another column of my data frame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM