[英]Pandas, check if the cell in a column is digit, if true then subtract that exact same cell with the value of cell in another column
[英]Python Pandas check cell value in column based on value in another cell
如果某列中的值大于 0,我需要检查 dataframe 中的每一行。
tshirt pants sweater socks Product_1 Product_2 Product_3 Expected
0 1 0 1 sweater tshirt pants True
1 1 0 1 sweater tshirt socks True
0 1 0 0 socks sweater socks False
1 1 0 1 sweater tshirt sweater True
0 0 0 0 socks sweater tshirt False
因此,例如“Product_1”列中的值是“tshirt”,如果值大于 0,我需要检查 thshirt 列。
如果三个“产品”列中的一个值的值大于 0,则另一列可能为 True,否则为 False(请参阅预期列)
生成示例数据的代码:
import pandas as pd
import numpy as np
recomendations = ['tshirt', 'pants', 'sweater', 'socks']
size = 100
data = pd.DataFrame()
# Generate data
for idx, i in enumerate(recomendations):
data[i] = np.random.choice([0,1], size=100)
if idx <= 3:
data[f'Product_{idx}'] = np.random.choice(recomendations, size=size)
# Sort
data[recomendations + ['Product_1', 'Product_2', 'Product_3']]
到目前为止,我已经通过循环帧以非常缓慢的方式计算了 True 值的百分比:
track = []
no_purchase = 0
cols = list(frame.columns)
str_cols = ['Product_1', 'Product_2', 'Product_3']
for idx, val in frame[column].iteritems():
if frame.iloc[idx, cols.index(val)] > 0:
track.append(1)
else:
track.append(0)
if frame.loc[idx, [i for i in frame.columns if i not in str_cols]].sum() < 1:
no_purchase += 1
result = no_purchase / (len(track) - np.sum(track))
return result
将get_dummies
用于Product
列, max
用于指标 output(远离 1,0):
df = pd.get_dummies(data.filter(like='Product'),prefix_sep='',prefix='').max(level=0,axis=1)
print (df)
socks sweater tshirt pants
0 0 1 1 1
1 1 1 1 0
2 1 1 0 0
3 0 1 1 0
4 1 1 1 0
然后链条件比较每行由1
填充的值&
用于按位AND
更大的值,如0
:
mask = df.eq(1) & data.loc[:, data.columns.isin(df.columns)].gt(0)
print (mask)
pants socks sweater tshirt
0 True False False False
1 False True False True
2 False False False False
3 False False False True
4 False False False False
最后通过DataFrame.any
测试每行是否至少有一个True
:
data['Expected1'] = mask.any(axis=1)
print (data)
tshirt pants sweater socks Product_1 Product_2 Product_3 Expected \
0 0 1 0 1 sweater tshirt pants True
1 1 1 0 1 sweater tshirt socks True
2 0 1 0 0 socks sweater socks False
3 1 1 0 1 sweater tshirt sweater True
4 0 0 0 0 socks sweater tshirt False
Expected1
0 True
1 True
2 False
3 True
4 False
另一种方法是这个
expected = []
for index, row in data.iterrows():
product1 = row["Product_1"]
product2 = row["Product_2"]
product3 = row["Product_3"]
if row[product1] > 0 or row[product2] > 0 or row[product3] > 0:
expected.append(True)
else:
expected.append(False)
data['Expected'] = expected
print(data)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.