熊猫抽样

Question

vsample_data = credit_card.sample(n=520, replace='False')

print(vsample_data)

Here, I was trying to sample 520 data points from a data set but not able to get correct sample data such that there is an equal probability of having two classes from credit card fraud data-set ie Class-0( Non- Fraud) and Class-1(Fraud). 在这里，我尝试从一个数据集中采样520个数据点，但无法获得正确的采样数据，因此，信用卡欺诈数据集具有两个类别的可能性相同，即Class-0（Non Fraud）和1级（欺诈）。

Answer 1

d = {'actions': [1, 2, 1, 6, 4], 'fraud': [True, False, True, True, False]}
df = pd.DataFrame(data=d)
print (pd.concat([frauds.sample(n = 1, replace = 'False'), normal.sample(n = 1, replace = 'False')]))

Answer 2

Create a fraud Dataframe 创建欺诈数据框

I will use a 10% probability of fraud cases: 我将使用10％的欺诈案件概率：

data = pd.DataFrame({'val':[random.randint(0,1000) for _ in range(1000)], 
                 'fraud':list(np.random.binomial(1, 0.1, 1000))})
data.head(10)

[out] [出]

fraud   val
0   0   359
1   0   731
2   0   146
3   0   975
4   0   295
5   0   467
6   0   366
7   1   69
8   0   18
9   0   297

Fraud cases should be oversampled 9 times compared to non-fraud cases. 与非欺诈案件相比，欺诈案件应被超采样9倍。

data['weights'] = data.fraud * 9
data['weights'] += 1

Weighted samples 加权样本

spl = data.sample(100,weights=data.weights)
sum(spl.fraud)

[out] [出]

Fraud cases are about 50% of the total samples. 欺诈案件约占样本总数的50％。

熊猫抽样

问题描述

2 个解决方案

解决方案1
0 2018-04-24 13:27:47

解决方案2
0 2018-04-24 13:55:20

Create a fraud Dataframe 创建欺诈数据框

Weighted samples 加权样本

熊猫抽样

问题描述

2 个解决方案

解决方案1 0 2018-04-24 13:27:47

解决方案2 0 2018-04-24 13:55:20

Create a fraud Dataframe 创建欺诈数据框

Weighted samples 加权样本

解决方案1
0 2018-04-24 13:27:47

解决方案2
0 2018-04-24 13:55:20