简体   繁体   English

使用python从多个文件中提取数据

[英]Extracting data from multiple files with python

I'm trying to extract data from a directory with 12 .txt files. 我正在尝试从包含12个.txt文件的目录中提取数据。 Each file contains 3 columns of data (X,Y,Z) that i want to extract. 每个文件包含3列我要提取的数据(X,Y,Z)。 I want to collect all the data in one df(InforDF), but so far i only succeeded in creating a df with all of the X,Y and Z data in the same column. 我想将所有数据收集到一个df(InforDF)中,但到目前为止,我只成功在同一列中创建了所有X,Y和Z数据的df。 This is my code: 这是我的代码:

import pandas as pd
import numpy as np
import os
import fnmatch

path = os.getcwd()

file_list = os.listdir(path)

InfoDF = pd.DataFrame()

for file in file_list:
    try:
        if fnmatch.fnmatch(file, '*.txt'):
            filedata = open(file, 'r')
            df = pd.read_table(filedata, delim_whitespace=True, names={'X','Y','Z'})

    except Exception as e:
        print(e)

What am i doing wrong? 我究竟做错了什么?

df = pd.read_table(filedata, delim_whitespace=True, names={'X','Y','Z'})

this line replace df at each iteration of the loop, that's why you only have the last one at the end of your program. 该行在循环的每次迭代中替换df ,这就是为什么在程序末尾只有最后一个的原因。

what you can do is to save all your dataframe in a list and concatenate them at the end 您可以做的是将所有数据框保存在列表中,并在最后将它们连接起来

df_list = []
for file in file_list:
    try:
        if fnmatch.fnmatch(file, '*.txt'): 
            filedata = open(file, 'r')
            df_list.append(pd.read_table(filedata, delim_whitespace=True, names={'X','Y','Z'}))
df = pd.concat(df_list)

alternatively, you can write it: 或者,您可以编写它:

df_list = pd.concat([pd.read_table(open(file, 'r'), delim_whitespace=True, names={'X','Y','Z'})  for file in file_list if fnmatch.fnmatch(file, '*.txt')])

I think you need glob for select all files, create list of DataFrames dfs in list comprehension and then use concat : 我认为您需要选择所有文件的glob ,在list comprehension创建DataFrames dfs list comprehension ,然后使用concat

files = glob.glob('*.txt')
dfs = [pd.read_csv(fp, delim_whitespace=True, names=['X','Y','Z']) for fp in files]

df = pd.concat(dfs, ignore_index=True)
  • As camilleri mentions above, you are overwriting df in your loop 正如camilleri所述,您正在循环中覆盖df
  • Also there is no point catching a general exception 也没有必要抓住一个普遍的例外

Solution : Create an empty dataframe InfoDF before the loop and then use append or concat to populate it with smaller df s 解决方案 :在循环之前创建一个空的数据InfoDF ,然后使用appendconcat为其填充较小的df

import pandas as pd
import numpy as np
import os
import fnmatch

path = os.getcwd()

file_list = os.listdir(path)

InfoDF = pd.DataFrame(columns={'X','Y','Z'}) # create empty dataframe
for file in file_list:
    if fnmatch.fnmatch(file, '*.txt'): 
        filedata = open(file, 'r')
        df = pd.read_table(filedata, delim_whitespace=True, names={'X','Y','Z'})
        InfoDF.append(df, ignore_index=True)
print InfoDF

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM