[英]Reshaping data frame and counting values based on criteria
I have the data set below.我有下面的数据集。 I am trying to determine the type of customer by providing a tag.
我试图通过提供标签来确定客户的类型。 My excel crashes due to too much data when I attempt, so trying to complete with Python.
当我尝试时,由于数据过多,我的 excel 崩溃,因此尝试使用 Python 完成。
item customer qty
------------------
ProdA CustA 1
ProdA CustB 1
ProdA CustC 1
ProdA CustD 1
ProdB CustA 1
ProdB CustB 1
In Excel, I would:在 Excel 中,我会:
1. Create new columns "ProdA", "ProdB", "Type"
2. Remove duplicates for column "customer"
3. COUNTIF Customer = ProdA, COUNTIF customer = ProdB
4. IF(AND(ProdA = 1, ProdB = 1), "Both", "One")
customer ProdA ProdB Type
--------------------------
CustA 1 1 Both
CustB 1 1 Both
CustC 1 0 One
CustD 1 0 One
We can achieve this using pd.crosstab
, and then using the sum of ProdA
and ProdB
to Series.map
2 -> Both
& 1 -> One
:我们可以使用
pd.crosstab
实现这pd.crosstab
,然后使用ProdA
和ProdB
的总和到Series.map
2 -> Both
& 1 -> One
:
dfn = pd.crosstab(df['customer'], df['item']).reset_index()
dfn['Type'] = dfn[['ProdA', 'ProdB']].sum(axis=1).map({2:'Both', 1:'One'})
Or we can use np.where
in the last line to conditionally assign Both
or One
:或者我们可以在最后一行中使用
np.where
有条件地分配Both
或One
:
dfn['Type'] = np.where(dfn['ProdA'].eq(1) & dfn['ProdB'].eq(1), 'Both', 'One')
item customer ProdA ProdB Type
0 CustA 1 1 Both
1 CustB 1 1 Both
2 CustC 1 0 One
3 CustD 1 0 One
We can also use pd.crosstab
more extensively with the margins=True
argument:我们还可以通过
margins=True
参数更广泛地使用pd.crosstab
:
dfn = pd.crosstab(df['customer'], df['item'],
margins=True,
margins_name='Type').iloc[:-1].reset_index()
dfn['Type'] = dfn['Type'].map({2:'Both', 1:'One'})
item customer ProdA ProdB Type
0 CustA 1 1 Both
1 CustB 1 1 Both
2 CustC 1 0 One
3 CustD 1 0 One
Try using set_index
, unstack
and np.select
:尝试使用
set_index
, unstack
和np.select
:
df_out = df.set_index(['customer', 'item'])['qty'].unstack(fill_value=0)
SumProd = df_out['ProdA'] + df_out['ProdB']
df_out['Type'] = np.select([SumProd==2, SumProd==1, SumProd==0],['Both', 'One', 'None'])
print(df_out)
Output:输出:
item ProdA ProdB Type
customer
CustA 1 1 Both
CustB 1 1 Both
CustC 1 0 One
CustD 1 0 One
In addition to the other suggestions, you could skip Pandas entirely:除了其他建议之外,您还可以完全跳过 Pandas:
################################################################################
## Data ingestion
################################################################################
import csv
import StringIO
# Formated to make the example more straightforward.
input_data = StringIO.StringIO('''item,customer,qty
ProdA,CustA,1
ProdA,CustB,1
ProdA,CustC,1
ProdA,CustD,1
ProdB,CustA,1
ProdB,CustB,1
''')
records = []
reader = csv.DictReader(input_data)
for row in reader:
records.append(row)
################################################################################
## Data transformation.
## Makes a Dict-of-Dicts. Each inner Dict contains all data for a single
## customer.
################################################################################
products = {'ProdA', 'ProdB'}
customer_data = {}
for r in records:
customer_id = r['customer']
if not customer_id in customer_data:
customer_data[customer_id] = {}
customer_data[customer_id][r['item']] = int(r['qty'])
# Determines the customer type.
for c in customer_data:
c_data = customer_data[c]
missing_product = products.difference(c_data.keys())
matching_product = products.intersection(c_data.keys())
if missing_product:
for missing_p in missing_product:
c_data[missing_p] = 0
c_data['type'] = 'One'
else:
c_data['type'] = 'Both'
################################################################################
## Data display
################################################################################
for i, c in enumerate(customer_data):
if i == 0:
print('\t'.join(['ID'] + customer_data[c].keys()))
print('\t'.join([c] + [str(x) for x in customer_data[c].values()]))
Which, for me, prints this对我来说,打印这个
ID ProdA type ProdB
CustC 1 One 0
CustB 1 Both 1
CustA 1 Both 1
CustD 1 One 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.