[英]Combine CSV file data to one CSV file
I have csv files spread around in multiple directories, each of the csv file has only one column containing data. 我的csv文件分布在多个目录中,每个csv文件只有一个包含数据的列。 What I want to do is read all these files and bring each file's column into on csv file.
我要做的是读取所有这些文件,并将每个文件的列都放入csv文件中。 Final csv file will have columns with filename as its headers and respective data from its original file as its column data.
最终的csv文件将具有以filename为标题的列,并将其原始文件中的相应数据作为其列数据。
This is my directory structure inside ~/csv_files/ ls
这是
~/csv_files/ ls
内部的目录结构
ab arc bat-smg bn cdo crh diq es fo gd haw ia iu ki ksh lez lv mo na no os pih rmy sah simple ss tet tr ur war zea
ace arz bcl bo ce cs dsb et fr gl he id ja kk ku lg map-bms mr nah nov pa pl rn sc sk st tg ts uz wo zh
af as
each directory has two csv files, I thought of using os.walk() function but I think my understanding of the os.walk is incorrect and thats why currently what I have doesn't produce anything. 每个目录都有两个csv文件,我考虑过使用os.walk()函数,但是我认为我对os.walk的理解是不正确的,这就是为什么我现在所拥有的不产生任何东西。
import sys, os
import csv
root_path = os.path.expanduser(
'~/data/missing_files')
def combine_csv_files(path):
for root, dirs, files in os.walk(path):
for dir in dirs:
for name in files:
if name.endswith(".csv"):
csv_path = os.path.expanduser(root_path + name)
if os.path.exists(csv_path):
try:
with open(csv_path, 'rb') as f:
t = f.read().splitlines()
print t
except IOError, e:
print e
def main():
combine_csv_files(root_path)
if __name__=="__main__":
main()
My questions are: 我的问题是:
At the end i am trying to get csv file like this, (Here are the potential headers) 最后,我试图获取像这样的csv文件,(这里是潜在的标头)
ab_csv_data_file1, ab_csv_data_file2, arc_csv_data_file1, arc_csv_data_file2
You are incorrectly using os.walk() 您错误地使用了os.walk()
def combine_csv_files(path):
for root, dirs, files in os.walk(path):
for name in files:
if name.endswith(".csv"):
csv_path = os.path.join(root, name)
try:
with open(csv_path, 'rb') as f:
t = f.read().splitlines()
print t
except IOError, e:
print e
The os.walk() function yields a 3-tuple (dirpath, dirnames, filenames). os.walk()函数产生一个三元组(目录路径,目录名,文件名)。 And the "dirpath" is the path of currently walking directory, the "dirnames" is a list of directories in "dirpath", the "filenames" is a list of files in "dirpath".
而“ dirpath”是当前正在走目录的路径,“ dirnames”是“ dirpath”中目录的列表,“ filenames”是“ dirpath”中文件的列表。 "dirpath" might be the "path" here, and any subfolder of "path".
“ dirpath”可能是此处的“路径”,也可能是“ path”的任何子文件夹。
I don't know whether I understand what you mean. 我不知道我是否明白你的意思。 Let's you have multiple folders, such as "ab", "arc" and so on.
让我们有多个文件夹,例如“ ab”,“ arc”等。 For each folder, it contains two CSV files.
对于每个文件夹,它包含两个CSV文件。
If I am right, then you are not doing the right thing. 如果我是对的,那你就没有做对的事。
def combine_csv_files(path):
for root, dirs, files in os.walk(path):
for dir in dirs:
for dirpath, sub_dirs, sub_files in os.walk('/'.join([path,dir])
for name in sub_files:
if name.endswith(".csv"):
csv_path = os.path.expanduser(dirpath + name)
if os.path.exists(csv_path):
try:
with open(csv_path, 'rb') as f:
t = f.read().splitlines()
print t
except IOError, e:
print e
The above code should works, if I am right 如果我正确的话,上面的代码应该可以工作
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.