简体   繁体   English

从.txt 中提取以空格分隔的列并添加以保存在新的 dataframe 中

[英]Extract space-separated columns from .txt and add to save in new dataframe

I'm a newb when it comes to python and I'm trying to create a script that loops through a folder and takes all.txt files containing 2 columns of data that are only separated by spaces.当谈到 python 时,我是新手,我正在尝试创建一个脚本,该脚本循环遍历文件夹并获取包含 2 列数据的 all.txt 文件,这些数据仅由空格分隔。 I then want to take only the second column from these.txts and save them to a new dataframe with 'lag' as the index and the filename as the header.然后,我只想从这些.txts 中取出第二列,并将它们保存到一个新的 dataframe 中,索引为“滞后”,文件名为 header。 I'm a little stuck as I can't seem to get it further than printing the filenames and thats it.我有点卡住了,因为我似乎无法比打印文件名更进一步,仅此而已。 Any help would be greatly appreciated.任何帮助将不胜感激。 (PS apologies for the embarrasing -50 to 50 line - I know there's a more efficient way of doing it, but couldn't find a way that would work with negative values. Thanks in advance. (PS 为令人尴尬的 -50 到 50 行道歉 - 我知道有一种更有效的方法,但找不到适用于负值的方法。提前致谢。

    def changeFolder(self):
    #print('woo')

    folder = QFileDialog.getExistingDirectory(None, 'Project Data', '.csv files')
    print(folder)
    if folder == None:
        return
    else:
        print(folder)

    # import required modules
    print('woo')
    from glob import glob
    import pandas as pd
    import numpy as np
    import os
    for files in os.listdir(folder):
        if files.endswith(".txt"):
            print(files)
            data = [pd.read_csv(files, sep=" ", header=None) for files in folder]

    for data in files:
        print(data)
    # transpose columns using numpy
       #tcols = np.transpose(cols)
    # create lag variable for the time lag array from -50 to 50
    lag = [-50, -49, -48, -47, -46, -45, -44, -43, -42, -41, -40, -39, -38, -37, -36, -35, -34, -33, -32, -31, -30,
           -29, -28, -27, -26, -25, -24, -23, -22, -21, -20, -19, -18, -17, -16, -15, -14, -13, -12, -11, -10, -9,
           -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
           21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
           47, 48, 49, 50]
    # constructs dataframe using pandas with the transposed columns, header as respective filenames and index column as time lag
    df = pd.DataFrame(data, columns=[files], index=lag)
    # converts dataframe to .csv file and saves as specified filename below in specified path
    extracted = df.to_csv(r'D:\GLaDOS-CAMPUS\data\TestData-AB\ExtractedABFiles.csv')

    ##Dialogue box in case of success
    mbox = QMessageBox()
    mbox.setText("Hopefully this worked!")
    mbox.setDetailedText("")
    mbox.setStandardButtons(QMessageBox.Ok)
    mbox.setWindowTitle('CSV Batch Processor')
    mbox.exec_()

Try this:尝试这个:

lag = [-50, -49, -48, -47, -46, -45, -44, -43, -42, -41, -40, -39, -38, -37, -36, -35, -34, -33, -32, -31, -30,
        -29, -28, -27, -26, -25, -24, -23, -22, -21, -20, -19, -18, -17, -16, -15, -14, -13, -12, -11, -10, -9,
        -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
        21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
        47, 48, 49, 50]
df1 = pd.DataFrame(lag)
for file in os.listdir(r"C:\Users\Wilian\Documents\DPROF"):
    if file.endswith(".txt"):
        df2 = pd.read_csv(file, delimiter = "\t")
        df1[Path(file).stem]= df2.iloc[:,1]
        
df1.set_index(0,inplace=True)
df1.to_csv(r'D:\GLaDOS-CAMPUS\data\TestData-AB\ExtractedABFiles.csv')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM