簡體   English   中英

Python Pandas - 比較數據幀元組值

[英]Python Pandas - compare dataframe tuple values

我在一個DataFrame對象中有兩列包含元組。

 df    a                         b
      ('chicken wing', 1)        ('saucy', 0.35)
      ('burger', 0.85)           ('mason', 0.97)
      ('burping', 0.37)          ('lost in space', 0.47)
      ('marvelous', 1)           ('tremendous', .85)

我需要將包含較大數字的元組返回到新列。 如果舊列保持在df范圍內並不重要

結果

df     max_value

       ('chicken wing', 1)
       ('mason', 0.97)
       ('lost in space', 0.47)
       ('marvelous', 1)   

你可以這樣做:

In [1]: df['a'].where( df.apply(lambda row: row['a'][1] > row['b'][1], axis=1), df['b'])

Out [1]: 

0        (chicken wing, 1)
1            (mason, 0.97)
2    (lost in space, 0.47)
3           (marvelous, 1)
Name: a, dtype: object

所以在這里我們使用lambda的元組進行比較的每一行生成一個布爾面膜,然后用這與where返回列,如果True否則返回列“B”

apply的輸出:

In[3]:
df.apply(lambda row: row['a'][1] > row['b'][1], axis=1)

Out[3]: 
0     True
1    False
2    False
3     True
dtype: bool

更高效的方法是將百分比提取到單獨的列中,以便在比較中使用向量化方法:

In[4]:
df['a_%'] = df['a'].apply(lambda x: x[1])
df['b_%'] = df['b'].apply(lambda x: x[1])
df

Out[4]: 
                   a                      b   a_%   b_%
0  (chicken wing, 1)          (saucy, 0.35)  1.00  0.35
1     (burger, 0.85)          (mason, 0.97)  0.85  0.97
2    (burping, 0.37)  (lost in space, 0.47)  0.37  0.47
3     (marvelous, 1)     (tremendous, 0.85)  1.00  0.85

In[5]:
df['max_value'] = df['a'].where(df['a_%'] > df['b_%'], df['b'])
df

Out[5]: 
                   a                      b   a_%   b_%              max_value
0  (chicken wing, 1)          (saucy, 0.35)  1.00  0.35      (chicken wing, 1)
1     (burger, 0.85)          (mason, 0.97)  0.85  0.97          (mason, 0.97)
2    (burping, 0.37)  (lost in space, 0.47)  0.37  0.47  (lost in space, 0.47)
3     (marvelous, 1)     (tremendous, 0.85)  1.00  0.85         (marvelous, 1)

您還可以定義自定義函數來處理動態數量的cols並使用max

In[11]:
def func(x):
    vals = [y[1] for y in x]
    return x[vals.index(max(vals))]
df.apply(lambda row: func(row), axis=1)

Out[11]: 
0        (chicken wing, 1)
1            (mason, 0.97)
2    (lost in space, 0.47)
3           (marvelous, 1)
dtype: object

嘗試這個

def compare_tuples(row):
    if row['a'][1] >= row['b'][1]:
        return row['a']
    else:
        return row['b']
df['larger'] = df.apply(compare_tuples, axis=1)
In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"a" : [('chicken wing', 1), ('burger', 0.85), ('burping', 0.37), ('marvelous', 1)], "b": [('saucy', 0.35), ('mason', 0.97), ('lost in space', 0.47), ('tremendous', .85)]})

In [3]: df['max_value'] = [a_value if (a_value[1] > b_value[1]) else b_value for a_value, b_value in zip(df.a, df.b)]

In [4]: df
Out[4]: 
                   a                      b              max_value
0  (chicken wing, 1)          (saucy, 0.35)      (chicken wing, 1)
1     (burger, 0.85)          (mason, 0.97)          (mason, 0.97)
2    (burping, 0.37)  (lost in space, 0.47)  (lost in space, 0.47)
3     (marvelous, 1)     (tremendous, 0.85)         (marvelous, 1)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM