简体   繁体   English

Python,从csvs处理数组并插入数据库

[英]Python, manipulate arrays from csvs and insert into database

I want to read in multiple csv files as arrays and duplicate the rows of those arrays based on the numeric value of the first entry in each row, (if the value is 1, it isn't duplicated, but if the value is 3 that row is represented 3 times). 我想读取多个csv文件作为数组,并根据每行中第一个条目的数值来复制这些数组的行,(如果值为1,则不会重复,但是如果值为3,则表示行代表3次)。 After manipulating the arrays I want to insert them into a db table. 操纵数组后,我想将它们插入到db表中。

sample csv files: 样本csv文件:

mult, n1, n2, n3, n4
1, 23.2, 55, 0, 1.1
3, 6.6, 0.2, 5, 9
1, 2.2, 5, 8, 9
2, 3.3, 10, 2, 2

mult, n1, n2, n3, n4
2, 23.2, 55, 0, 1.1
3, 6.6, 0.2, 5, 9
1, 2.2, 5, 8, 9
1, 3.3, 10, 2, 2

desired outcome 期望的结果

[[1, 23.2, 55, 0, 1.1],
[3, 6.6, 0.2, 5, 9],
[3, 6.6, 0.2, 5, 9],
[3, 6.6, 0.2, 5, 9],
[1, 2.2, 5, 8, 9],
[2, 3.3, 10, 2, 2],
[2, 3.3, 10, 2, 2]]

[[2, 23.2 55, 0, 1.1],
[2, 23.2 55, 0, 1,1],
[3, 6.6, 0.2, 5, 9],
[3, 6.6, 0.2, 5, 9],
[3, 6.6, 0.2, 5, 9],
[1, 2.2, 5, 8, 9],
[1, 3.3, 10, 2, 2]]

Initially I read the csvs in as a list and then had a for loop to insert each row into the db based on the number in row[0] 最初,我以列表形式读取csvs,然后有一个for循环,根据row [0]中的数字将每一行插入db。

basic code snippet of what currently works: 当前工作的基本代码段:

import csv, os, glob
import psycopg2

path = "/home/user/Desktop/files/*.csv"

for fname in glob.glob(path):
    self.readFile(fname)

with open(filename, 'r') as f:
    arr= list(csv.reader(f)) 
    iter_arr = iter(arr)
    next(iter_arr)

    for row in iter_arr:                   
        mult = int(float(row[0]))
        for i in range (mult):
            try:
                self.cur.execute("INSERT INTO csv_table VALUES (%s, %s, %s, %s, %s)", row)

             except Exception, exc:     
                 locked = True
                 print ("%s", exc)     

The above code works in that it will load the correct amount of rows in the database table, but I thought it would be more useful to duplicate the rows before loading them to the database so that I can manipulate the data in the arrays further if I need to, like changing or adding values. 上面的代码的工作方式是,它将在数据库表中加载正确数量的行,但是我认为在将行加载到数据库之前复制行会更有用,这样我可以进一步处理数组中的数据需要,例如更改或添加值。

I asked a question earlier about using numpy, which allowed me to manipulate some randomly generated arrays correctly, but instead of duplicating rows as separate elements it is duplicating them within each other. 之前我问了一个有关使用numpy的问题,这使我可以正确地操纵一些随机生成的数组,但是与其将行复制为单独的元素,还不如将它们彼此复制。 I can't figure out how to resize it to get it to work, plus it seems like. 我不知道如何调整它的大小以使其正常工作,而且看起来还不错。 There Resizing does not seem to work and I get a ('%s', TypeError('not all arguments converted during string formatting',)) 那里调整大小似乎不起作用,我得到了('%s', TypeError('not all arguments converted during string formatting',))

a = ([list(map(float, row)) for row in csv.reader(f)])
aa = np.asarray(a)
result = ([np.tile(aa[i], aa[i, 1].astype(int)) for i in range(aa.shape[0])])result = np.asarray(result)

Outcome 结果

[[1, 23.2, 55, 0, 1.1],
[3, 6.6, 0.2, 5, 9,
3, 6.6, 0.2, 5, 9,
3, 6.6, 0.2, 5, 9],
[1, 2.2, 5, 8, 9],
[2, 3.3, 10, 2, 2,
2, 3.3, 10, 2, 2]]

[[2, 23.2 55, 0, 1.1,
2, 23.2 55, 0, 1,1],
[3, 6.6, 0.2, 5, 9,
3, 6.6, 0.2, 5, 9,
3, 6.6, 0.2, 5, 9],
[1, 2.2, 5, 8, 9],
[1, 3.3, 10, 2, 2]]

Will this work for you? 这对您有用吗? I converted your string from above into a list, then looped through each line and appended it to the final array 我从上面将您的字符串转换为列表,然后遍历每一行并将其附加到最终数组中

a = """mult, n1, n2, n3, n4
1, 23.2, 55, 0, 1.1
3, 6.6, 0.2, 5, 9
1, 2.2, 5, 8, 9
2, 3.3, 10, 2, 2
2, 23.2, 55, 0, 1.1
3, 6.6, 0.2, 5, 9
1, 2.2, 5, 8, 9
1, 3.3, 10, 2, 2"""

a = a.split('\n')

final = a[0]
for line in a[1:]:
    for i in range(int(line[0])):
        final.append(line)

You can create a generator function that accepts an iterable of rows (such as a csv.reader ), inspects each row to determine how many times it should be repeated, and yields each row the required number of times. 您可以创建一个生成器函数,该函数接受可迭代的行(例如csv.reader ),检查每一行以确定应重复多少次,并按要求的次数产生每一行。

import csv


def generate_rows(rows):
    for row in rows:
        num_repeats = int(row[0])
        for _ in range(num_repeats):
            yield row


if __name__ == '__main__':
    with open('test.csv', newline='') as f:
        reader = csv.reader(f)
        next(reader)    # skip headers
        for row in generate_rows(reader):
            print(row)

Given your first example csv, the program produces this output: 给定您的第一个示例csv,程序将产生以下输出:

['1', '23.2', '55', '0', '1.1']
['3', '6.6', '0.2', '5', '9']
['3', '6.6', '0.2', '5', '9']
['3', '6.6', '0.2', '5', '9']
['1', '2.2', '5', '8', '9']
['2', '3.3', '10', '2', '2']
['2', '3.3', '10', '2', '2']

If you want to collect the output in a list, just call list on the generator function: 如果要将输出收集在列表中,只需在生成器函数上调用list即可:

rows = list(generate_rows(iterable))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM