[英]Extracting a column from a collection of csv files and constructing a new table with said data
I'm a newbie when it comes to Python with a bit more experience in MATLAB.我是 Python 的新手,在 MATLAB 方面有更多经验。 I'm currently trying to write a script that basically loops through a folder to pick up all the.csv files, extract column 14 from csv file 1 and adding it to column 1 of the new table, extract column 14 from csv file 2 and adding it to column 2 of the new table, to build up a table of column 14 from all csvfiles in the folder.
我目前正在尝试编写一个脚本,该脚本基本上遍历一个文件夹以获取所有.csv 文件,从 csv 文件 1 中提取第 14 列并将其添加到新表的第 1 列,从 Z628ZCB15675FFE2888AFE3FZ 文件中提取第 14 列将其添加到新表的第 2 列,以从文件夹中的所有 csvfiles 构建第 14 列的表。 I'd ideally like to have the headers of the new table to show the respective filename that said column 14 has been extracted from.
理想情况下,我希望新表的标题显示已从中提取所述列 14 的相应文件名。
I've considered that Python is base0 so I've double checked that it reads the desired column, but as my code stands, i can only get it to print all the files' 14th columns in the one array and I'm not sure how to split it up to put it into a table.我认为 Python 是 base0,所以我仔细检查了它是否读取了所需的列,但是就我的代码而言,我只能让它打印一个数组中所有文件的第 14 列,我不确定如何将其拆分以将其放入表中。 Perhaps via dataframe, although I'm not entirely sure how they work.
也许通过 dataframe,虽然我不完全确定它们是如何工作的。 Any help would be greatly appreciated!
任何帮助将不胜感激!
Code attached below:下面附上代码:
import os
import sys
import csv
pathName = "D:/GLaDOS-CAMPUS/data/TestData-AB/"
numFiles = []
fileNames = os.listdir(pathName)
for fileNames in fileNames:
if fileNames.endswith(".csv"):
numFiles.append(fileNames)
print(numFiles)
for i in numFiles:
file = open(os.path.join(pathName, i), "rU")
reader = csv.reader(file, delimiter=',')
for column in reader:
print(column[13])
I'm not sure if your way of finding files is right or not.我不确定您查找文件的方式是否正确。 Since I do not have a folder with
csv
files.因为我没有包含
csv
文件的文件夹。 But I can say it is way better to use glob
for getting list of files:但我可以说使用
glob
获取文件列表会更好:
from glob import glob
files = glob("/Path/To/Files/*.csv")
This will return all csv
files.这将返回所有
csv
文件。
CSV
filesCSV
文件Now we need to find a way to read all files and get 13th
column.现在我们需要找到一种方法来读取所有文件并获取
13th
列。 I don't know if it is an overkill but I prefer to use pandas
and numpy
to get 13th
column.我不知道这是否矫枉过正,但我更喜欢使用
pandas
和numpy
来获得13th
列。
To read a column of a csv
file using pandas
one can use:要使用
pandas
读取csv
文件的列,可以使用:
pd.read_csv(file, usecols=[COL])
Now we can loop over files and get 13th
columns:现在我们可以遍历文件并获得
13th
列:
columns = [pd.read_csv(file, usecols=[2]).values[:, 0] for file in files]
Notice we converted all values to numpy
arrays.请注意,我们将所有值转换为
numpy
arrays。
In columns
we have our each column as an element of a list.在
columns
中,我们将每一列作为列表的一个元素。 So it is technical rows.所以这是技术行。 Not columns.
不是列。 Now we should get the transpose of the array so it will become columns:
现在我们应该得到数组的转置,使它变成列:
pd.DataFrame(np.transpose(columns))
The whole code would look like:整个代码如下所示:
from glob import glob
import pandas as pd
import numpy as np
files = glob("/Path/To/Files/*.csv")
columns = [pd.read_csv(file, usecols=[2]).values[:, 0] for file in files]
print(pd.DataFrame(np.transpose(columns)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.