简体   繁体   中英

Data cleaning for-loop in python for pos data

I have a pos data of message shop. The Data is as shown in attached picture. 在此处输入图片说明

##read data from csv
data = pd.read_csv('test1.csv')
#make a kist for each column
sales_id = list(data['sales_id'])
shop_number = list(data['shop_number'])
sales = list(data['sales'])
cashier_no = list(data['cashier_no'])
messager_no = list(data['messager_no'])
type_of_sale = list(data['type_of_sale'])
costomer_ID = list(data['costomer_ID'])
type_of_sale = list(data['type_of_sale'])
date = list(data['date'])
time = list(data['time'])

I want make a new list showing that the data of purchase should be deleted. like this:

data_to_clean= [0,1,0,1,0,0,1,0,1]

To do it I want to make a for loop

for i in range(len(type_of_sale)):
    data_to_clean=[]
    if type_of_sale[i] == "purchase":
        data_to_clean = data_to_clean.append(0)
    elif type_of_sale[i] == "return":
        data_to_clean = data_to_clean.append(1)
        ## I want to write a code so I can delete purchasse data too 
        #with conditions if it has the same shop_number,messager_no,costomer_ID and -price

    return list(data_to_clean)

There is two main problem in this code. One it doesn't move. 2nd I don't know how to check shop_number , messager_no and costomer_ID to put 1 or 0 in my data_to_clean list. sometimes I have to check for the data above like sales_id(1628060) and sometimes its below like sales_id(1599414) Knowing that the cashier may differ. but the constomer_Id should be the same always.

The question is how to write a the code so I can create a list or dataframe with 0 and 1 to show which data should be deleted.

When you want to compare data with string in Python, you should put this string in qoutes:

for i in range(len(type_of_sale)):
        data_to_clean=[]
        if type_of_sale[i] == "purchase": # here
            data_to_clean = data_to_clean.append(0)
        elif type_of_sale[i] == "return": # and here
            data_to_clean = data_to_clean.append(1)

check pandas doc . Getting the items which are a return order can be as simple as

returns = data.loc[data['type_of_sale'] == 'return']

If you want the sales of cashier 90

data.loc[(data['type_of_sale'] == 'purchase') & (data['cashier_no'] == 90)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM