简体   繁体   English

使用awk处理数据库

[英]Using awk to process a database

I have a directory on my computer which contains an entire database I found online for my research. 我的计算机上有一个目录,其中包含我在网上为我的研究找到的整个数据库。 This database contains thousands of files, so to do what I need I've been looking into file i/o stuff. 这个数据库包含数千个文件,所以要做我需要的东西,我一直在寻找文件的i / o东西。 A programmer friend suggested using bash/awk. 程序员朋友建议使用bash / awk。 I've written my code: 我写了我的代码:

    #!/usr/bin/env awk
    ls -l|awk'
    BEGIN {print "Now running"}
    {if(NR == 17 / $1 >= 0.4 / $1 <= 2.5)
    {print $1 > wavelengths.txt;
    print $2 > reflectance.txt;
    print $3 > standardDev.txt;}}END{print "done"}'

When I put this into my console, I'm already in the directory of the files I need to access. 当我把它放到我的控制台时,我已经在我需要访问的文件的目录中。 The data I need begins on line 17 of EVERY file. 我需要的数据从每个文件的第17行开始。 The data looks like this: 数据如下所示:

some number    some number    some number
some number    some number    some number
    .              .              .
    .              .              .
    .              .              .

I want to access the data when the first column has a value of 0.4 (or approximately) and get the information up until the first column has a value of approximately 2.5. 我希望在第一列的值为0.4(或大约)时访问数据,并获取信息,直到第一列的值大约为2.5。 The first column represents wavelengths. 第一列表示波长。 I want to verify they are all the same for each file later, so I copy them into a file. 我想稍后验证它们对于每个文件都是相同的,所以我将它们复制到一个文件中。 The second column represents reflectance and I want this to be a separate file because later I'll take this information and build a data matrix from it. 第二列代表反射率,我希望这是一个单独的文件,因为稍后我将获取此信息并从中构建数据矩阵。 And the third column is the standard deviation of the reflectance. 第三列是反射率的标准偏差。

The problem I am having now is that when I run this code, I get the following error: No such file or directory 我现在遇到的问题是,当我运行此代码时,我收到以下错误:没有这样的文件或目录

Please, if anyone can tell me why I might be getting this error, or can guide me as to how to write the code for what I am trying to do... I will be so grateful. 请问,如果有人能告诉我为什么我可能会收到这个错误,或者可以指导我如何编写我想要做的代码......我将非常感激。

Excellent attempt, but this is because you should never parse the output of ls . 出色的尝试,但这是因为你永远不应该解析ls的输出 Still, you were probably looking for ls -1 , not ls -l . 不过,你可能正在寻找ls -1 ,而不是ls -l awk can also accept a glob of files. awk也可以接受一串文件。 For example, in the desired directory, you can run: 例如,在所需目录中,您可以运行:

awk -f /path/to/script.awk *

Contents of script.awk : script.awk内容:

BEGIN {
    print "Now running"
}

NR == 17 && $1 >= 0.4 && $1 <= 2.5 {

    print $1 > "wavelengths.txt"
    print $2 > "reflectance.txt"
    print $3 > "standardDev.txt"
}

END {
    print "Done"
}

The main problem is that you need to quote the names of the output file names as they are strings not variables. 主要问题是您需要引用输出文件名的名称,因为它们是字符串而不是变量。 Use: 使用:

print $1 > "wavelengths.txt"

instead of: 代替:

print $1 > wavelengths.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM