简体   繁体   English

根据Python中的值差异将列文本文件拆分为较小的文件

[英]Splitting column text file into smaller files based on value differences in Python

I am trying to split a text file with 3 columns into many smaller individual text files based on the presence of jumps in value in the first column. 我试图根据第一列中值的跳跃情况将具有3列的文本文件拆分为许多较小的单个文本文件。 here is an example of a small part of the file to be split: 这是要分割的一小部分文件的示例:

2457062.30520078 1.00579146 1 2457062.30520078 1.00579146 1

2457062.30588184 1.00607543 1 2457062.30588184 1.00607543 1

2457062.30656300 1.00605515 1 2457062.30656300 1.00605515 1

2457062.71112193 1.00288150 1 2457062.71112193 1.00288150 1

2457062.71180299 1.00322454 1 2457062.71180299 1.00322454 1

2457062.71248415 1.00430136 1 2457062.71248415 1.00430136 1

Between lines 3 and 4 there is a jump larger than usual. 在第3行和第4行之间有一个比平常大的跳跃。 This would be the point where the data is split and the individually created text files are separated, creating one with the first three lines and one with the latter 3 lines. 这将是分割数据和分离单独创建的文本文件的关键点,其中前三行创建一个文本文件,后三行创建一个文本文件。 The jumps always exceed a change of 0.1 in the first column. 第一列中的跃变始终超过0.1。 The goal is to have any jump like this example be the split point to separate the files. 目标是使任何像此示例一样的跳转成为分离文件的分割点。 Any insight is appreciated, thanks 任何见解表示赞赏,谢谢

I would loop through the main file and keep writing lines as long as your condition is met. 只要满足您的条件,我就会遍历主文件并继续写行。 That fits the definition of a while loop perfectly. 这完全符合while循环的定义。 The main complexity with this is that you need two open files at the same time (the main one and the one you are currently writing to), but that's not a problem for Python. 这样做的主要复杂性是您需要同时打开两个文件(主要文件和当前正在写入的文件),但这对于Python来说不​​是问题。

MAINTEXT = "big_file.txt"
SFILE_TEMPL = 'small_file_{:03.0g}.txt'
# Delimiter is a space in the example you gave, but 
#  might be tab (\t) or comma or anything.
DELIMITER = ' ' 

LIM = .1

# i will count how many files we have created.
i = 0

# Open the main file
with open(MAINTEXT) as mainfile:
    # Read the first line and set up some things
    line = mainfile.readline()
    # Note that we want the first element ([0]) before
    #  the delimiter (.split(DELIMITER)) of the row (line)
    #  as a number (float)
    v_cur = float(line.split(DELIMITER)[0])
    v_prev = v_cur

    # This will stop the loop once we reach end of file (EOF)
    #  as readline() will then return an empty string.
    while line:
        # Open the second file for writing (mode='w').
        with open(SFILE_TEMPL.format(i), mode='w') as subfile:
            # As long as your values are in the limit, keep 
            #  writing lines to the current file.
            while line and abs(v_prev - v_cur)<LIM:
                subfile.write(line)
                line = mainfile.readline()
                v_prev = v_cur
                v_cur = float(line.split(DELIMITER)[0])
        # Increment the file counter
        i += 1
        # Make sure we don't get stuck after one file
        #  (If we don't replace v_prev here, the while loop
        #  will never execute after the first time.)
        v_prev = v_cur

Assume that your file is test.txt so 假设您的文件是test.txt,因此

f=open('test.txt').read().split('\n')
for i in f:
    frst_colmn,second_colmn,thrid_colmn = i.split('')

With you read the file , but what you want to do exactly??? 当您阅读文件时,但是您想做什么呢???

You can detect the jumps while reading the file 您可以在读取文件时检测到跳跃

def reader(infile):
    number = float('-infinity')
    for line in infile:
        prev, number = number, float(line.split(' ', 1)[0])
        jump = number - prev >= 0.1
        yield jump, line

for jump, line in reader(infile):
    # jump is True if one must open a new output file
    ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM