简体   繁体   English

Bash for 循环未写入文件

[英]Bash for loop not writing to file

I often work like this:我经常这样工作:

for skra in `ls *txt` ; do paste foo.csv <(cut -f 5 $skra) > foo.csv; done

for looping through a directory by using 'ls'使用 'ls' 循环遍历目录

Now I don't understand why this command does not add column to foo.csv in every loop现在我不明白为什么这个命令不会在每个循环中将列添加到 foo.csv

What is happening under the hood?引擎盖下发生了什么? Seems like foo.csv is not saved in every iteration似乎 foo.csv 在每次迭代中都没有保存

The output I get is field 5 from the last file.我得到的 output 是最后一个文件中的字段 5。 Not even the original foo.csv as I get if I only paste foo.csv bar.txt如果我只paste foo.csv bar.txt

EDIT: All files are tab delimited编辑:所有文件都是制表符分隔的

foo.csv is just one column in the beginning foo.csv 只是开头的一列

example.txt as seen in vim with set list :如 vim 中所见的 example.txt 与set list

(101,6352)(11174,51391)(10000,60000)^INC_044048.1^I35000^I6253^I0.038250$ (668,7819)(23384,69939)(20000,70000)^INC_044048.1^I45000^I7153^I0.034164$ (2279,8111)(32691,73588)(30000,80000)^INC_044048.1^I55000^I5834^I0.031908$ (101,6352)(11174,51391)(10000,60000)^INC_044048.1^I35000^I6253^I0.038250$ (668,7819)(23384,69939)(20000,70000)^INC_044048.1^I45000^ I7153^I0.034164$ (2279,8111)(32691,73588)(30000,80000)^INC_044048.1^I55000^I5834^I0.031908$

Here is a python script that does what I want:这是一个 python 脚本,它可以满足我的要求:

import pandas

rammi=[]

with open('window.list') as f:

    for line in f:

        nafn=line.strip()

        df=pandas.read_csv(nafn, header=None, names=[nafn], sep='\t', usecols=[4])

        rammi.append(df)

frame = pandas.concat(rammi, axis=1)

frame.to_csv('rammi.allra', sep='\t', encoding='utf-8')

Paste column 4 from all files to one (initially I wanted to retain one original column but it was not necessary).将所有文件中的第 4 列粘贴到一个(最初我想保留一个原始列,但没有必要)。 The question was about bash not wanting to update stdin in the for loop.问题是关于 bash 不想在 for 循环中更新标准输入。

As already noted in the comments, opening foo.csv for output will truncate it in most shells.正如评论中已经指出的那样,为 output 打开foo.csv将在大多数 shell 中截断它。 (Even if that was not the case, opening the file and running cut and paste repeatedly looks quite inefficient.) (即使不是这样,打开文件并反复运行cutpaste看起来效率很低。)

If you don't mind keeping all the data in memory at one point in time, a simple AWK or Bash script can do this type of processing without any further processes such as cut or paste .如果您不介意在某个时间点保留 memory 中的所有数据,一个简单的 AWK 或 Bash 脚本可以执行此类处理,无需任何进一步的处理,例如cutpaste

awk -F'\t' '    { lines[FNR] = lines[FNR] "\t" $5 }
            END { for (l in lines) print substr(lines[l], 2) }' \
    *.txt > foo.csv

(The output should not be called .csv , but I'm sticking with the naming from the question nonetheless.) (不应将 output 称为.csv ,但我仍然坚持问题中的命名。)

Actually, one doesn't really need awk for this, Bash will do:实际上,对此并不真正需要awk , Bash 会这样做:

#!/bin/bash
lines=()
for file in *.txt; do
  declare -i i=0
  while IFS=$'\t' read -ra line; do
    lines[i++]+=$'\t'"${line[4]}"
  done < "$file"
done
printf '%s\n' "${lines[@]/#?}" > foo.csv

(As a side note, "${lines[@]:1}" would remove the first line, not the first ( \t ) character of each line. (This particular expansion syntax works differently for strings (scalars) and arrays in Bash.) Hence "${lines[@]/#?}" (another way to express the removal of the first character), which does get applied to each array element.) (作为旁注, "${lines[@]:1}"将删除第一行,而不是每行的第一个( \t )字符。(这种特殊的扩展语法对字符串(标量)和 arrays 的工作方式不同Bash。)因此, "${lines[@]/#?}" (表示删除第一个字符的另一种方式)确实应用于每个数组元素。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM