简体   繁体   English

"使用 awk 基于另一个文件从文件中提取行"

[英]Extracting rows from file based on another file using awk

I have two files.我有两个文件。

File 1:文件 1:

SNP Allele1 Allele2 Effect  StdErr  PVAL    Direction   HetISq  HetChiSHetDf    HetPVal
rs12266638  t   g   0.4259  0.0838  3.776e-07   +?  0.0 0.000   0   1
rs7995014   t   c   2.2910  0.5012  4.853e-06   +?  0.0 0.000   0   1

You may use this awk<\/code> :你可以使用这个awk<\/code> :

awk 'FNR==NR {a[$3]; next} FNR> 1 && $1 in a' file2 file1

rs12266638  t   g   0.4259  0.0838  3.776e-07   +?  0.0 0.000   0   1

Depending on how big the dataset is, this should be fairly fast, only accessing each file once.根据数据集的大小,这应该相当快,每个文件只访问一次。 Granted, not on a system where I can compare at the moment, so mostly a hunch.当然,不是在我目前可以比较的系统上,所以主要是一种预感。 A solution like this is probably only suitable if the amount of unique identifiers isn't very large, though.不过,这样的解决方案可能仅适用于唯一标识符的数量不是很大的情况。

#!/bin/bash
snp_expression=$(awk 'FNR>1{print $3}' file_2 | sort -u | paste -sd "|")
grep -E "^(${snp})[[:space:]]" file_1 > file_3

A more general solution which works for any position of the SNP field:适用于 SNP 字段的任何位置的更通用的解决方案:

# SO71009277.awk
BEGIN {
  fnr = 0
  while ((getline < ARGV[1]) > 0) {
    ++fnr
    if (fnr == 1) {
      for (i=1; i<=NF; i++)
        FIELDBYNAME1[$i] = i # e.g. FIELDBYNAME1["SNP"] = 1
    }
    else {
      SNP_KEY[$FIELDBYNAME1["SNP"]] = $0
    }    
  }
  close(ARGV[1])

  fnr = 0
  while ((getline < ARGV[2]) > 0) {
    ++fnr
    if (fnr == 1) {
      for (i=1; i<=NF; i++)
        FIELDBYNAME2[$i] = i # e.g. FIELDBYNAME2["SNP"] = 3
    }
    else {
      if ($FIELDBYNAME2["SNP"] in SNP_KEY)
        print SNP_KEY[$FIELDBYNAME2["SNP"]]
    }    
  }
  close(ARGV[2])
}

Call:称呼:

awk -f SO71009277.awk file1.txt file2.txt
=>
rs12266638  t   g   0.4259  0.0838  3.776e-07   +?  0.0 0.000   0   1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从另一个查询文件中的一个文件中提取行-Linux - extracting rows from one file based on another query file - linux 如何根据 linux 中的条件(使用 awk 或任何其他)从另一个 csv 文件替换一个 csv 文件的行? - How to replac rows of one csv file from another csv file based on a condition in linux(using awk or any other)? 使用awk从文件中提取信息,然后使用bash将其存储到变量中 - extracting information from a file using awk and storing it into a variable with bash 基于另一列从 a.csv 文件中提取特定列 - Extracting a specific column from a .csv file, based on another column awk将模式从一个文件匹配到另一个文件 - awk to match pattern from a file to another file awk 脚本根据特定模式查找字符串并查找其对应的行并构建另一个输出文件 - Awk script to find string based on particular pattern and look for its corresponding rows and build another output file 使用 awk 计算另一个文件中模式出现的次数 - using awk to count the number of occurrences of pattern from another file 使用 awk 使用基于一个公共列的另一个文件的列更新文件的某些列 - Using awk to update some columns of a file using the columns of another file based on one common column 使用awk和sed在基于awk的字段值的文件中查找行 - using awk and sed find using lines in a file based on a field value from awk 根据另一个file_2从file_1获取行 - Fetching rows from file_1 based on another file_2
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM