[英]Python Pandas - compare dataframe tuple values
我在一個DataFrame
對象中有兩列包含元組。
df a b
('chicken wing', 1) ('saucy', 0.35)
('burger', 0.85) ('mason', 0.97)
('burping', 0.37) ('lost in space', 0.47)
('marvelous', 1) ('tremendous', .85)
我需要將包含較大數字的元組返回到新列。 如果舊列保持在df
范圍內並不重要
df max_value
('chicken wing', 1)
('mason', 0.97)
('lost in space', 0.47)
('marvelous', 1)
你可以這樣做:
In [1]: df['a'].where( df.apply(lambda row: row['a'][1] > row['b'][1], axis=1), df['b'])
Out [1]:
0 (chicken wing, 1)
1 (mason, 0.97)
2 (lost in space, 0.47)
3 (marvelous, 1)
Name: a, dtype: object
所以在這里我們使用lambda的元組進行比較的每一行生成一個布爾面膜,然后用這與where
返回列,如果True
否則返回列“B”
apply
的輸出:
In[3]:
df.apply(lambda row: row['a'][1] > row['b'][1], axis=1)
Out[3]:
0 True
1 False
2 False
3 True
dtype: bool
更高效的方法是將百分比提取到單獨的列中,以便在比較中使用向量化方法:
In[4]:
df['a_%'] = df['a'].apply(lambda x: x[1])
df['b_%'] = df['b'].apply(lambda x: x[1])
df
Out[4]:
a b a_% b_%
0 (chicken wing, 1) (saucy, 0.35) 1.00 0.35
1 (burger, 0.85) (mason, 0.97) 0.85 0.97
2 (burping, 0.37) (lost in space, 0.47) 0.37 0.47
3 (marvelous, 1) (tremendous, 0.85) 1.00 0.85
In[5]:
df['max_value'] = df['a'].where(df['a_%'] > df['b_%'], df['b'])
df
Out[5]:
a b a_% b_% max_value
0 (chicken wing, 1) (saucy, 0.35) 1.00 0.35 (chicken wing, 1)
1 (burger, 0.85) (mason, 0.97) 0.85 0.97 (mason, 0.97)
2 (burping, 0.37) (lost in space, 0.47) 0.37 0.47 (lost in space, 0.47)
3 (marvelous, 1) (tremendous, 0.85) 1.00 0.85 (marvelous, 1)
您還可以定義自定義函數來處理動態數量的cols並使用max
:
In[11]:
def func(x):
vals = [y[1] for y in x]
return x[vals.index(max(vals))]
df.apply(lambda row: func(row), axis=1)
Out[11]:
0 (chicken wing, 1)
1 (mason, 0.97)
2 (lost in space, 0.47)
3 (marvelous, 1)
dtype: object
嘗試這個
def compare_tuples(row):
if row['a'][1] >= row['b'][1]:
return row['a']
else:
return row['b']
df['larger'] = df.apply(compare_tuples, axis=1)
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({"a" : [('chicken wing', 1), ('burger', 0.85), ('burping', 0.37), ('marvelous', 1)], "b": [('saucy', 0.35), ('mason', 0.97), ('lost in space', 0.47), ('tremendous', .85)]})
In [3]: df['max_value'] = [a_value if (a_value[1] > b_value[1]) else b_value for a_value, b_value in zip(df.a, df.b)]
In [4]: df
Out[4]:
a b max_value
0 (chicken wing, 1) (saucy, 0.35) (chicken wing, 1)
1 (burger, 0.85) (mason, 0.97) (mason, 0.97)
2 (burping, 0.37) (lost in space, 0.47) (lost in space, 0.47)
3 (marvelous, 1) (tremendous, 0.85) (marvelous, 1)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.