[英]Using awk to process a database
I have a directory on my computer which contains an entire database I found online for my research. 我的计算机上有一个目录,其中包含我在网上为我的研究找到的整个数据库。 This database contains thousands of files, so to do what I need I've been looking into file i/o stuff. 这个数据库包含数千个文件,所以要做我需要的东西,我一直在寻找文件的i / o东西。 A programmer friend suggested using bash/awk. 程序员朋友建议使用bash / awk。 I've written my code: 我写了我的代码:
#!/usr/bin/env awk
ls -l|awk'
BEGIN {print "Now running"}
{if(NR == 17 / $1 >= 0.4 / $1 <= 2.5)
{print $1 > wavelengths.txt;
print $2 > reflectance.txt;
print $3 > standardDev.txt;}}END{print "done"}'
When I put this into my console, I'm already in the directory of the files I need to access. 当我把它放到我的控制台时,我已经在我需要访问的文件的目录中。 The data I need begins on line 17 of EVERY file. 我需要的数据从每个文件的第17行开始。 The data looks like this: 数据如下所示:
some number some number some number
some number some number some number
. . .
. . .
. . .
I want to access the data when the first column has a value of 0.4 (or approximately) and get the information up until the first column has a value of approximately 2.5. 我希望在第一列的值为0.4(或大约)时访问数据,并获取信息,直到第一列的值大约为2.5。 The first column represents wavelengths. 第一列表示波长。 I want to verify they are all the same for each file later, so I copy them into a file. 我想稍后验证它们对于每个文件都是相同的,所以我将它们复制到一个文件中。 The second column represents reflectance and I want this to be a separate file because later I'll take this information and build a data matrix from it. 第二列代表反射率,我希望这是一个单独的文件,因为稍后我将获取此信息并从中构建数据矩阵。 And the third column is the standard deviation of the reflectance. 第三列是反射率的标准偏差。
The problem I am having now is that when I run this code, I get the following error: No such file or directory 我现在遇到的问题是,当我运行此代码时,我收到以下错误:没有这样的文件或目录
Please, if anyone can tell me why I might be getting this error, or can guide me as to how to write the code for what I am trying to do... I will be so grateful. 请问,如果有人能告诉我为什么我可能会收到这个错误,或者可以指导我如何编写我想要做的代码......我将非常感激。
Excellent attempt, but this is because you should never parse the output of ls
. 出色的尝试,但这是因为你永远不应该解析ls
的输出 。 Still, you were probably looking for ls -1
, not ls -l
. 不过,你可能正在寻找ls -1
,而不是ls -l
。 awk
can also accept a glob of files. awk
也可以接受一串文件。 For example, in the desired directory, you can run: 例如,在所需目录中,您可以运行:
awk -f /path/to/script.awk *
Contents of script.awk
: script.awk
内容:
BEGIN {
print "Now running"
}
NR == 17 && $1 >= 0.4 && $1 <= 2.5 {
print $1 > "wavelengths.txt"
print $2 > "reflectance.txt"
print $3 > "standardDev.txt"
}
END {
print "Done"
}
The main problem is that you need to quote the names of the output file names as they are strings not variables. 主要问题是您需要引用输出文件名的名称,因为它们是字符串而不是变量。 Use: 使用:
print $1 > "wavelengths.txt"
instead of: 代替:
print $1 > wavelengths.txt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.