简体   繁体   English

循环遍历CSV文件中的数据,以便将“ 1”和“ 0”输出到文本文件(Python)

[英]Looping through data in a CSV file in order to output '1' and '0' to a text file (Python)

I have recently started learning Python and have run into a problem in trying to format some data for a project I am working on. 我最近开始学习Python,并在尝试格式化我正在处理的项目的某些数据时遇到问题。 I have managed to take in a CSV file as an input and I am now trying to go through that data and output '1's and '0's based upon the data, in to a text file. 我设法将CSV文件作为输入,现在我尝试遍历该数据并将基于数据的输出“ 1”和“ 0”输入文本文件。

I have the following code so far: 到目前为止,我有以下代码:

data = {} 
productIds = [] 

for row in reader:
    productIds.append(row['productCode']) 
    if row['basketID'] not in data:
        data[row['basketID']] = [row['productCode']]
    else:
        data[row['basketID']].append(row['productCode'])

productIds = sorted(set(productIds))

for item in productIds:
    txtFile.write("%s " % item)
txtFile.write('\n')

for key in data: # Will loop through each basket
    for value in data[key]: #Loop through each product in basket
        for i in productIds: # Go through list of available products
            if value == i: 
                txtFile.write('1 ')
            else:
                txtFile.write('0 ')
    txtFile.write('\n')

The result: 结果:

23 24 25 #Products 
1  0  0  0 1 0 0 0 1 #Basket 1
1  0  0              #Basket 2
1  0  0              #Basket 3
0  0  1              #Basket 4
0  1  0  0 0 1       #Basket 5

Expected result: 预期结果:

23 24 25 #Products
1  1  1  #Basket 1  
1  0  0  #Basket 2  
1  0  0  #Basket 3  
0  0  1  #Basket 4
0  1  1  #Basket 5

CSV File: CSV档案:

basketID productCode 
1        23  
1        24  
1        25  
2        23  
3        23  
4        25  
5        24  
5        25  

I believe it is going wrong when looping through the product list against the same product, but I am not sure how else to achieve this. 我认为在针对同一产品浏览产品列表时会出错,但是我不确定如何实现这一目标。

I think you should try this.First read as Dataframe 我认为你应该尝试一下。首先读为Dataframe

>>> df = pd.read_csv("lia.csv")
>>> df
   basketID  productCode
0         1           23
1         1           24
2         1           25
3         2           23
4         3           23
5         4           25
6         5           24
7         5           25

Then 然后

g1 = df.groupby( [ "productCode","basketID"] ).count()
g1
Empty DataFrame
Columns: []
Index: [(23, 1), (23, 2), (23, 3), (24, 1), (24, 5), (25, 1), (25, 4), (25, 5)

The problem lies in the last for loop. 问题出在最后一个for循环中。 You are traversing for each basket and iterating over each product in the current basket. 您要遍历每个购物篮并遍历当前购物篮中的每个产品。 For each item you are checking that if it is equal to current productId. 对于每个项目,您都在检查它是否等于当前productId。 As there are 3 productIds you are getting 3x entries of item present in basket. 由于有3个productId,因此您在购物篮中获得3x项输入。

Example: For basket1 , you are looping through first item=>23 for this you are making 3 entries in your output file: for i in productIds 1. 23 = 23 => 1 2. 23 = 24= >0 3. 23= 25=> 0 示例:对于basket1,您正在循环浏览第一个项目=> 23,为此您在输出文件中输入了3个条目:对于productIds 1中的i。23 = 23 => 1 2. 23 = 24 => 0 3. 23 = 25 => 0

Additionally, you have one more problem. 此外,您还有另一个问题。 As your dict is not sorted by keys the order of basket looping is not guaranteed to be from basket1 to basket5 in increasing order. 由于您的字典没有按键排序,因此不能保证篮子循环的顺序是从篮子1到篮子5递增。

Replace the last for loop with:(sorting the dict followed by the correct iteration) 将last for循环替换为:(对字典排序,然后进行正确的迭代)

data=collections.OrderedDict(sorted(data.items()));
for key in data: # Will loop through each basket
    for productId in productIds: #Loop through each productId
        if productId in data[key]: # check if productId in the basket products 
            txtFile.write('1 ')
        else:
            txtFile.write('0 ')
    txtFile.write('\n')

Output: 输出:

23 24 25 
1 1 1 
1 0 0 
1 0 0 
0 0 1 
0 1 1 

Try this: 尝试这个:

data = {} 
productIds = [] 

for row in reader:
    productIds.append(row['productCode']) 
    if row['basketID'] not in data:
        data[row['basketID']] = set(row['productCode'])
    else:
        data[row['basketID']].add(row['productCode'])

productIds = sorted(set(productIds))

for item in productIds:
    txtFile.write("%s " % item)
txtFile.write('\n')

for key in data: # Will loop through each basket
    for value in sorted(data[key]): #Loop through each product in basket
        for i in productIds: # Go through list of available products
            if value == i: 
                txtFile.write('1 ')
            else:
                txtFile.write('0 ')
    txtFile.write('\n')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM