简体   繁体   English

将CSV文件数据合并为一个CSV文件

[英]Combine CSV file data to one CSV file

I have csv files spread around in multiple directories, each of the csv file has only one column containing data. 我的csv文件分布在多个目录中,每个csv文件只有一个包含数据的列。 What I want to do is read all these files and bring each file's column into on csv file. 我要做的是读取所有这些文件,并将每个文件的列都放入csv文件中。 Final csv file will have columns with filename as its headers and respective data from its original file as its column data. 最终的csv文件将具有以filename为标题的列,并将其原始文件中的相应数据作为其列数据。

This is my directory structure inside ~/csv_files/ ls 这是~/csv_files/ ls内部的目录结构

ab   arc  bat-smg   bn       cdo  crh      diq  es   fo   gd   haw  ia   iu   ki   ksh  lez  lv   mo   na      no   os   pih  rmy   sah  simple  ss   tet  tr   ur   war  zea
ace  arz  bcl       bo       ce   cs       dsb  et   fr   gl   he   id   ja   kk   ku   lg   map-bms  mr   nah     nov  pa   pl   rn    sc   sk      st   tg   ts   uz   wo   zh
af   as

each directory has two csv files, I thought of using os.walk() function but I think my understanding of the os.walk is incorrect and thats why currently what I have doesn't produce anything. 每个目录都有两个csv文件,我考虑过使用os.walk()函数,但是我认为我对os.walk的理解是不正确的,这就是为什么我现在所拥有的不产生任何东西。

import sys, os
import csv

root_path = os.path.expanduser(
    '~/data/missing_files')

def combine_csv_files(path):
    for root, dirs, files in os.walk(path):
        for dir in dirs:
            for name in files:
                if name.endswith(".csv"):
                    csv_path = os.path.expanduser(root_path + name)
                    if os.path.exists(csv_path):
                        try:
                            with open(csv_path, 'rb') as f:
                                t = f.read().splitlines()
                                print t
                        except IOError, e:
                            print e

def main():
    combine_csv_files(root_path)

if __name__=="__main__":
    main()

My questions are: 我的问题是:

  1. What am I doing wrong here? 我在这里做错了什么?
  2. Can I read a one csv column from another file and add that data as a column to another file because csv files are more row dependent and here there are no dependency between rows. 我可以从另一个文件中读取一个csv列,然后将该数据作为列添加到另一个文件中,因为csv文件与行的相关性更大,并且这里行与行之间没有依赖性。

At the end i am trying to get csv file like this, (Here are the potential headers) 最后,我试图获取像这样的csv文件,(这里是潜在的标头)

ab_csv_data_file1, ab_csv_data_file2, arc_csv_data_file1, arc_csv_data_file2

You are incorrectly using os.walk() 您错误地使用了os.walk()

def combine_csv_files(path):
    for root, dirs, files in os.walk(path):
        for name in files:
            if name.endswith(".csv"):
                csv_path = os.path.join(root, name)
                try:
                    with open(csv_path, 'rb') as f:
                        t = f.read().splitlines()
                        print t
                except IOError, e:
                    print e

The os.walk() function yields a 3-tuple (dirpath, dirnames, filenames). os.walk()函数产生一个三元组(目录路径,目录名,文件名)。 And the "dirpath" is the path of currently walking directory, the "dirnames" is a list of directories in "dirpath", the "filenames" is a list of files in "dirpath". 而“ dirpath”是当前正在走目录的路径,“ dirnames”是“ dirpath”中目录的列表,“ filenames”是“ dirpath”中文件的列表。 "dirpath" might be the "path" here, and any subfolder of "path". “ dirpath”可能是此处的“路径”,也可能是“ path”的任何子文件夹。

I don't know whether I understand what you mean. 我不知道我是否明白你的意思。 Let's you have multiple folders, such as "ab", "arc" and so on. 让我们有多个文件夹,例如“ ab”,“ arc”等。 For each folder, it contains two CSV files. 对于每个文件夹,它包含两个CSV文件。

If I am right, then you are not doing the right thing. 如果我是对的,那你就没有做对的事。

def combine_csv_files(path):
    for root, dirs, files in os.walk(path):
        for dir in dirs:
            for dirpath, sub_dirs, sub_files in os.walk('/'.join([path,dir])
                for name in sub_files:
                    if name.endswith(".csv"):
                        csv_path = os.path.expanduser(dirpath + name)
                        if os.path.exists(csv_path):
                            try:
                                with open(csv_path, 'rb') as f:
                                    t = f.read().splitlines()
                                    print t
                            except IOError, e:
                                print e

The above code should works, if I am right 如果我正确的话,上面的代码应该可以工作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM