简体   繁体   English

有条件地将一些行从一个文件写入另一个文件

[英]Write some lines conditionally from a file to another file

I have a text file (fixed width) with following format:我有一个具有以下格式的文本文件(固定宽度):

a1   b   c1    d     -> header
1    2    3    4
6    4    3    5
a2   b    c2   d2    -> header
7    9    1    4
a    b1   c6   d2    -> header
8    9    3    4

From this file, I want to create another file with filtered rows.从这个文件中,我想创建另一个包含过滤行的文件。 If the column with c has value 3 AND column b does not have value 2 then I want that row including the header.如果带有c的列的值为 3 且b列的值不为 2,那么我希望该行包含标题。 If the column c does not have value 3 then I do not want any rows including its header.如果列c没有值 3 那么我不想要任何行,包括它的标题。 The new file must therefore look like this:因此,新文件必须如下所示:

a1   b   c1    d
6    4    3    5
a    b1   c6   d2
8    9    3    4

Also, the value 3 can only occur in column with c and 2 only in column with b .此外,值 3 只能出现在带有c的列中,而 2 只能出现在带有b的列中。 So, we don't even have to check the columns, as long as the filtering condition is met in a row.所以,我们甚至不用去查列,只要连续满足过滤条件即可。 The only important thing is, if the conditions are met I also need the header for that row.唯一重要的是,如果满足条件,我还需要该行的标题。 If not, I also don't want header for that.如果没有,我也不想要标题。

How can achieve this?怎样才能做到这一点?

What I have tried is to read through the file in pandas with .read_fwf() and save each dataframe.我尝试过的是使用.read_fwf()读取 pandas 中的文件并保存每个数据帧。 After that, I filter the dataframe and write them to a file with .to_string() .之后,我过滤数据框并使用.to_string()将它们写入文件。 This does exactly what I want (kinda) but the number of whitespaces are not consistent.这正是我想要的(有点),但空格的数量不一致。 Since, it is a fixed-width file, I want the new file to have the exact same format as the old file.因为它是一个固定宽度的文件,所以我希望新文件的格式与旧文件完全相同。 I also tried to write with np.savetxt() but it also has issue with whitespaces.我也尝试用np.savetxt()编写,但它也有空格问题。 Shame, pandas does not have write_fwf .遗憾的是,熊猫没有write_fwf

So, maybe instead of pandas I could use plain python to do this?那么,也许我可以使用普通的 python 来代替 pandas 来做到这一点? Or even a bash(powershell) script?甚至是 bash(powershell) 脚本? Anything that works:)任何有用的东西:)

This might work for you (GNU sed):这可能对你有用(GNU sed):

sed -nE ':a;/^a/{h;:b;n;/^\S+ +[^2 ]+ +3 /H;$bc;/^a/{:c;x;/\n/p;x;ba};bb}' file

This is a filtering operation, so set -n and extend regexps by using -E .这是一个过滤操作,因此设置-n并使用-E扩展正则表达式。

Make a copy of the current header and loop through the data rows appending those that meet the criteria to it.制作当前标题的副本并循环遍历数据行,将满足条件的行附加到它。

At end of file or the next header, print the previous header and any rows.在文件末尾或下一个标题处,打印前一个标题和任何行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM