简体   繁体   English

如何在列中使用具有相同值的python中的CSV并创建具有唯一值的新CSV

[英]How to manipulate a CSV in python with same values on a column and create a new one with unique values

The issue that I have to face is that I have a csv file with same data on more than one column(here the unique_code), and I want to create a new csv having only one time the data on this column and the data from the other columns to be seperated by space if they are different(here the alternative_code). 我必须面对的问题是,我有一个csv文件,其中一个数据包含多个列(此处为unique_code),并且我想创建一个新的csv,该列仅包含该列数据和其他列,如果它们不同则用空格分隔(此处为alternate_code)。

Here is my csv. 这是我的csv。

Unique_code description alternative_code 唯一代码说明Alternative_code

33;product1;58

43;product2;95

33;product1;62

68;product3;11

43;product2;99

My desired csv result: 我想要的csv结果:

33;product1;58 62

43;product2;95 99

68;product3;11

Any ideas on how can I implement my new csv? 关于如何实现新的csv的任何想法?

You can try something like: 您可以尝试如下操作:

vals = {}
names = {}
with open(input_filename,'r') as file:
    for line in file:
        l = line.replace("\n","")
        l = l.split(";")
        if(vals.has_key(l[0])):
            vals[l[0]].append(l[2])
        else:
            vals[l[0]] = [l[2]]
            names[l[0]] = l[1]

with open(output_filename,'w') as file:
    for key in vals.keys():

        res = str(key)+";"+str(names[key])+";"+str(vals[key][0])

        for i in range(0,len(vals[key])-1):
            res += " "+vals[key][i+1]
        res += '\n'

        file.write(res)
import csv

with open("my_file.csv", 'r') as fd:
    #import csv as list of list and remove blank line                
    data = [i for i in csv.reader(fd, delimiter=';') if i]                                       
    result = []
    for value in data:
        #check if product not in result 
        if value[1] not in [r[1] for r in result if r]:
            #add the new product to result with all values for the same product 
            result.append([value[0],
                           value[1],
                           ' '.join([line[2] for line in data if line[1] == value[1]])
                         ])
    print(result)

Finally I end up to this solution: 最后,我得出了这个解决方案:

# -*- coding: utf-8 -*-
import csv

input_file_1 = "eidi.csv"
output_file = "output.csv"

parsed_dictionary={}

def concatenate_alter_codes(alter_code_list):
    result = ""
    for alter_code in alter_code_list:
        result = result + (alter_code + " ")
        print result
    return result[:-1]

#Read input csv file and create a dictionary with a list of all alter codes
with open(input_file_1,'r') as f:
    # put ; symbol as delimeter
    input_csv=csv.reader(f,delimiter=';')
    for row in input_csv:
        # if the key exists in the dictionary
    if row[0] in parsed_dictionary:
        parsed_dictionary[row[0]][0].append(row[2])
    else:
        parsed_dictionary[row[0]] = [[row[2]], row[1], row[3], row[4], row[5], row[6]]

#create new csv file with concatenated alter codes

with open(output_file,'w') as f:
    for key in parsed_dictionary:
                f.write(key + ";" + concatenate_alter_codes(parsed_dictionary[key][0]) + ";" + parsed_dictionary[key][1] + ";" + parsed_dictionary[key][2] + ";" + parsed_dictionary[key][3] + ";" + parsed_dictionary[key][4] + ";" + parsed_dictionary[key][5] + "\n")

littletable is a thin CSV-wrapper I wrote a number of years ago. littletable是我几年前写的一个瘦CSV包装器。 Tables in littletable are lists of objects, with some helper methods for filtering, joining, pivoting, plus easy import/export of CSV, JSON, and fixed format data. littletable中的表是对象列表,带有一些用于过滤,联接,数据透视的辅助方法,以及轻松导入/导出CSV,JSON和固定格式数据的方法。 Like pandas, it helps with the data import/export, but doesn't have all the other numeric analytical features that pandas has. 像熊猫一样,它有助于数据的导入/导出,但不具有熊猫具有的所有其他数字分析功能。 It also keeps all the data in memory as a list of Python objects, so it wouldn't handle millions of rows as well as pandas would. 它还将所有数据作为Python对象列表保存在内存中,因此它不会像熊猫那样处理数百万行。 But if your needs are modest, then it might be a shorter learning curve to work with littletable. 但是,如果您的需求适中,那么使用littletable可能会缩短学习时间。

To load your initial raw data into a littletable Table starts with: 要将初始原始数据加载到littletable表中,首先需要:

import littletable as lt
data = open('raw_data.csv')
tt = lt.Table().csv_import(data, fieldnames="id name altid".split(), delimiter=';')

(If there were a header row in your input file, csv_import would use that and would not require that you specify fieldnames .) (如果输入文件中包含标题行,则csv_import将使用该标题行,并且不需要您指定fieldnames 。)

Printing out the rows looks just like iterating over a list: 打印出行看起来就像遍历列表:

for row in tt:
    print(row)

prints: 打印:

{'name': 'product1', 'altid': '58', 'id': '33'}
{'name': 'product2', 'altid': '95', 'id': '43'}
{'name': 'product1', 'altid': '62', 'id': '33'}
{'name': 'product3', 'altid': '11', 'id': '68'}
{'name': 'product2', 'altid': '99', 'id': '43'}

Because we'll be grouping and joining on the id attributes, we add an index: 因为我们将对id属性进行分组和联接,所以我们添加了一个索引:

tt.create_index("id")

(Unique indexes can be created also, but in this case, there are duplicate values in your raw input with the same id.) (也可以创建唯一索引,但是在这种情况下,原始输入中的重复值具有相同的ID。)

Tables can be grouped by one or more attributes, and then each group of records can be passed to a function to give an aggregate value for that group. 可以按一个或多个属性对表进行分组,然后可以将每组记录传递给一个函数以提供该组的汇总值。 In your case, you want all the collected altids for each product id . 对于您的情况,您需要每个产品id所有收集的altids

def aggregate_altids(rows):
    return ' '.join(set(row.altid for row in rows if row.altid != row.id))
grouped_altids = tt.groupby("id", altids=aggregate_altids)

for row in grouped_altids:
    print(row)

Gives: 得到:

{'altids': '62 58', 'id': '33'}
{'altids': '99 95', 'id': '43'}
{'altids': '11', 'id': '68'}

Now we'll join this table with the original tt table on id , and collapse out duplicates: 现在,我们将此表与id上的原始tt表连接起来,并折叠出重复项:

tt2 = (grouped_altids.join_on('id') + tt)().unique("id")

And print out the results: 并打印出结果:

for row in tt2:
    print("{id};{name};{alt_ids}".format_map(vars(row)))

Giving: 赠送:

33;product1;58 62
43;product2;95 99
68;product3;11

The total code without the debugging looks like: 没有调试的总代码如下:

# import
import littletable as lt
with open('raw_data.csv') as data:
    tt = lt.Table().csv_import(data, fieldnames="id name altid".split(), delimiter=';')
tt.create_index("id")

# group
def aggregate_altids(rows):
    return ' '.join(set(row.altid for row in rows if row.altid != row.id))
grouped_altids = tt.groupby("id", alt_ids=aggregate_altids)

# join, dedupe, and sort
tt2 = (grouped_altids.join_on('id') + tt)().unique("id").sort("id")

# output
for row in tt2:
    print("{id};{name};{alt_ids}".format_map(vars(row)))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python - 如何遍历列中的唯一值,创建数据框并为每个值输出到 csv - Python - how to loop through unique values in column, create dataframe and output to csv for each one 从列中具有相同值的 CSV 文件在 python 中创建新的 CSV 文件 - Create new CSV files in python from the CSV files having same values in the column Python-如何从现有列中的唯一值和相应值创建数据框中的新列? - Python - how to create new columns in a dataframe from the unique values from an existing column with corresponding values? Python/Pandas - 识别一列中与另一列中完全相同的唯一值匹配的唯一值 - Python/Pandas - Identify unique values in one column which match the exact same unique values in another column Python pandas 根据一列的唯一值创建多列 - Python pandas create multiple columns based on unique values of one column 如何使用CSV文件的唯一值在Python中创建列表? - How to create a list in Python with the unique values of a CSV file? 如何在 Python 中从 CSV 创建唯一集并检查值 - How to create a unique set from CSV in Python and check for values 如何在多个csv中搜索值并在python中创建新的csv结果? - How to search values in multiple csv and create new csv with result in python? 列中每行具有唯一值的 Python/CSV 唯一行 - Python/CSV unique rows with unique values per row in a column 如何使用python计算一列中每一行的唯一值? - How to count the unique values of each row in one column with python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM