简体   繁体   English

从 csv 文件的集合中提取一列并用所述数据构建一个新表

[英]Extracting a column from a collection of csv files and constructing a new table with said data

I'm a newbie when it comes to Python with a bit more experience in MATLAB.我是 Python 的新手,在 MATLAB 方面有更多经验。 I'm currently trying to write a script that basically loops through a folder to pick up all the.csv files, extract column 14 from csv file 1 and adding it to column 1 of the new table, extract column 14 from csv file 2 and adding it to column 2 of the new table, to build up a table of column 14 from all csvfiles in the folder.我目前正在尝试编写一个脚本,该脚本基本上遍历一个文件夹以获取所有.csv 文件,从 csv 文件 1 中提取第 14 列并将其添加到新表的第 1 列,从 Z628ZCB15675FFE2888AFE3FZ 文件中提取第 14 列将其添加到新表的第 2 列,以从文件夹中的所有 csvfiles 构建第 14 列的表。 I'd ideally like to have the headers of the new table to show the respective filename that said column 14 has been extracted from.理想情况下,我希望新表的标题显示已从中提取所述列 14 的相应文件名。

I've considered that Python is base0 so I've double checked that it reads the desired column, but as my code stands, i can only get it to print all the files' 14th columns in the one array and I'm not sure how to split it up to put it into a table.我认为 Python 是 base0,所以我仔细检查了它是否读取了所需的列,但是就我的代码而言,我只能让它打印一个数组中所有文件的第 14 列,我不确定如何将其拆分以将其放入表中。 Perhaps via dataframe, although I'm not entirely sure how they work.也许通过 dataframe,虽然我不完全确定它们是如何工作的。 Any help would be greatly appreciated!任何帮助将不胜感激!

Code attached below:下面附上代码:

import os
import sys
import csv
pathName = "D:/GLaDOS-CAMPUS/data/TestData-AB/"
numFiles = []
fileNames = os.listdir(pathName)
for fileNames in fileNames:
    if fileNames.endswith(".csv"):
        numFiles.append(fileNames)
        print(numFiles)
for i in numFiles:
    file = open(os.path.join(pathName, i), "rU")
    reader = csv.reader(file, delimiter=',')
    for column in reader:
         print(column[13])

Finding files.查找文件。

I'm not sure if your way of finding files is right or not.我不确定您查找文件的方式是否正确。 Since I do not have a folder with csv files.因为我没有包含csv文件的文件夹。 But I can say it is way better to use glob for getting list of files:但我可以说使用glob获取文件列表会更好:

from glob import glob
files = glob("/Path/To/Files/*.csv")

This will return all csv files.这将返回所有csv文件。

Reading CSV files读取CSV文件

Now we need to find a way to read all files and get 13th column.现在我们需要找到一种方法来读取所有文件并获取13th列。 I don't know if it is an overkill but I prefer to use pandas and numpy to get 13th column.我不知道这是否矫枉过正,但我更喜欢使用pandasnumpy来获得13th列。

To read a column of a csv file using pandas one can use:要使用pandas读取csv文件的列,可以使用:

pd.read_csv(file, usecols=[COL])

Now we can loop over files and get 13th columns:现在我们可以遍历文件并获得13th列:

columns = [pd.read_csv(file, usecols=[2]).values[:, 0] for file in files]

Notice we converted all values to numpy arrays.请注意,我们将所有值转换为numpy arrays。

Merging all columns合并所有列

In columns we have our each column as an element of a list.columns中,我们将每一列作为列表的一个元素。 So it is technical rows.所以这是技术行。 Not columns.不是列。 Now we should get the transpose of the array so it will become columns:现在我们应该得到数组的转置,使它变成列:

pd.DataFrame(np.transpose(columns))

The code编码

The whole code would look like:整个代码如下所示:

from glob import glob
import pandas as pd
import numpy as np

files = glob("/Path/To/Files/*.csv")
columns = [pd.read_csv(file, usecols=[2]).values[:, 0] for file in files]
print(pd.DataFrame(np.transpose(columns)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM