繁体   English   中英

索引文件中的值

[英]Indexing values from file

我拼命想要从工作中完成一项任务,我无法弄明白。

简短说明:我必须监视一个产生一些值的文件。 我设法将值隔离到一个单独的文件中,我是字符串,为每个值分配一个索引。

描述:

我的给定文件(文件A)将如下所示,但有超过10000个条目:

402
506
223
123
5667
17430
9921
9232

内部的所有值基本上都是整数,范围在103到17431之间。对于文件A中的每个单独的数字,我必须分配一个从0到9的索引值。我的第一种方法是使用sed来从文件中简单地替换每个字符串A具有其特定索引的给定值,但对于我的大文件花费的时间太长。 另一种推荐给我的方法是使用awk但我也失败了。 我的脚本看起来像这样:

awk '($0>=363 && $0<=499) || ($0>=4645 && $0<=4646) {$0="0"}1' tmp >tmp2

awk '($0>=2174 && $0<=2193)  {$0="1"}1' tmp >tmp2

awk '($0==500) || ($0>=12308 && $0<=12356) {$0="2"}1' tmp >tmp2

awk '($0>=103 && $0<=220) || ($0>=252 && $0<=299) || ($0>=1980 && $0<=1986) || ($0>=2921 && $0<=2922) {$0="3"}1' tmp >priority

awk '($0>=221 && $0<=251) || ($0>=8085 && $0<=8091) || ($0==8350) || ($0>=12809 && $0<=12945) || ($0>=16834 && $0<=17033)  {$0="4"}1' tmp >tmp2

awk '($0>=300 && $0<=362) || ($0=522) || ($0>=2923 && $0<=2925) || ($0>=3441 && $0<=3442) || ($0=4644)|| ($0>=5677 && $0<=5695) || ($0>=8082 && $0<=8083)|| ($0>=8093 && $0<=8349) || ($0>=12946 && $0<=12947) || ($0>=21986 && $0<=13215) || ($0>=13309 && $0<=13311)  {$0="5"}1' tmp >tmp2

我希望输出像:

5
3
3
2
1
6
7
7

没有发生。 我声明了每个索引的范围,我尝试相应地替换每个值但是不起作用。 m trying to take a for循环or an if / else approach but i do not know how just because I新手。 如果有人可以帮助我一些有助于我的语法? 我试图写一些类似的东西:

x=value from file list.csv
for x in rage1 or range2 or range3 
 replace x with 0
for x in range 3 or range 4 or range 5
 replace x with 1

OR an if/else approach

x=values from list.csv
if x in range1 or range2 or range3 
  then replace x with 0
else if x in range4 range5 range6
  then replace x with 1

有人可以帮我这个吗? 我正试图以各种方式制造它(bash,pearl,python ......)所以任何想法都是受欢迎的,只要稍微解释一下,正如我说我对此不熟悉。 谢谢。

运行perl so-57624956.pl < fileA

use 5.010;
use Set::IntSpan::Fast::XS qw();
my @intspans = map {
    Set::IntSpan::Fast::XS->new($_)
} (
    '363-499,4645-4646',
    '2174-2193',
    '500,12308-12356',
    '103-220,252-299,1980-1986,2921-2922',
    '221-251,8085-8091,8350,12809-12945,16834-17033',
    '300-362,522,2923-2925,3441-3442,4644,5677-5695,'
    . '8082-8083,8093-8349,12946-12947,12986-13215,13309-13311',
);
while (<>) {
    while (my ($index, $intspan) = each @intspans) {
        say $index if $intspan->contains($_);
    }
}

awk脚本有什么问题? 这是带有您指定范围的awk one liner,它按预期工作。

awk '{ if( ($1>=363 && $1<=499) || ($1>=4645 && $1<=4646)){ print 0}  
else if( ($1>=2174 && $1<=2193)) { print 1}  
else if( ($1==500) || ($1>=12308 && $1<=12356)){ print 2} 
else if( ($1>=103 && $1<=220) || ($1>=252 && $1<=299) || ($1>=1980 && $1<=1986) || ($1>=2921 && $1<=2922)){ print 3} 
else if( ($1>=221 && $1<=251) || ($1>=8085 && $1<=8091) || ($1==8350) || ($1>=12809 && $1<=12945) || ($1>=16834 && $1<=17033)){ print 4} 
else if( ($1>=300 && $1<=362) || ($1=522) || ($1>=2923 && $1<=2925) || ($1>=3441 && $1<=3442) || ($1=4644)|| ($1>=5677 && $1<=5695) || ($1>=8082 && $1<=8083)|| ($1>=8093 && $1<=8349) || ($1>=12946 && $1<=12947) || ($1>=21986 && $1<=13215) || ($1>=13309 && $1<=13311)){ print 5}
}' tmp > tmp2

谢谢J23。 你只是保住我的工作。 如果你去过伦敦,说些什么,我就给你一杯啤酒吧。 所以,我通过添加剩下的值来解决问题。 一件简单的事情给我带来了一些麻烦,但很容易修复(没有抱怨或任何事情)。

($1=522)

一定是

($1==522)

现在,如果有人需要做与我的任务类似的事情,监视csv并通过为某些值添加索引来相应地更改数据,那就做我在该社区的帮助下所做的事情。

##print out your column from the target file. Just replace "NR" with your column number
csvtool col "NR" /path.to/the/file.csv >tmp 
## Use AWK to look for the range and then act accordingly by replacing your value with the correct index. 
awk '{ 
if( ($1>=363 && $1<=499) || ($1>=4645 && $1<=4646)){ print 0}  
else if( ($1>=2174 && $1<=2193)) { print 1}  
else if( ($1==500) || ($1>=12308 && $1<=12356)){ print 2} 
else if( ($1>=103 && $1<=220) || ($1>=252 && $1<=299) || ($1>=1980 && $1<=1986) || ($1>=2921 && $1<=2922)){ print 3} 
else if( ($1>=221 && $1<=251) || ($1>=8085 && $1<=8091) || ($1==8350) || ($1>=12809 && $1<=12945) || ($1>=16834 && $1<=17033)){ print 4} 
else if( ($1>=300 && $1<=362) || ($1==522) || ($1>=2923 && $1<=2925) || ($1>=3441 && $1<=3442) || ($1==4644)|| ($1>=5677 && $1<=5695) || ($1>=8082 && $1<=8083)|| ($1>=8093 && $1<=8349) || ($1>=12946 && $1<=12947) || ($1>=21986 && $1<=13215) || ($1>=13309 && $1<=13311)){ print 5}
else if( ($1>=501 && $1<=504) || ($1>=566 && $1<=600) || ($1>=613 && $1<=637) ||  ($1>=2015 && $1<=2040) ||  ($1>=2103 && $1<=2126) || ($1>=2373 && $1<=2374) || ($1>=3828 && $1<=4125) || ($1>=4237 && $1<=4636) || ($1>=4647 && $1<=4889) || ($1>=4991 && $1<=5676) || ($1>=5696 && $1<=5705) || ($1>=6502 && $1<=6595) || ($1>=8429 && $1<=8460) || ($1>=8552 && $1<=8699) || ($1>=10487 && $1<=10977) || ($1>=11326 && $1<=11617) || ($1>=11688 && $1<=11815) || ($1>=11844 && $1<=11938) || ($1>=12490 && $1<=12597) || ($1>=12973 && $1<=12982) || ($1>=13367 && $1<=13414)){ print 6}
else if( ($1>=523 && $1<=548) || ($1>=555 && $1<=565) || ($1>=2005 && $1<=2014) || ($1>=2041 && $1<=2063) || ($1>=2091 && $1<=2102) ||  ($1==2394) || ($1>=2407 && $1<=2411) || ($1>=2926 && $1<=3008) || ($1>=3443 && $1<=3473) || ($1>=3486 && $1<=3813) || ($1>=4132 && $1<=4144) || ($1>=4637 && $1<=4643) || ($1>=4916 && $1<=4981) || ($1>=5711 && $1<=5741) || ($1>=6403 && $1<=6405) || ($1>=6415 && $1<=6466) || ($1>=6701 && $1<=7002) || ($1>=7035 && $1<=7048) || ($1>=8426 && $1<=8428) || ($1>=8496 && $1<=8541) || ($1>=8857 && $1<=9323) || ($1>=9429 && $1<=9618) || ($1>=9674 && $1<=9789) || ($1>=9802 && $1<=9811) || ($1>=9850 && $1<=10009) || ($1>=10131 && $1<=10136) || ($1>=10396 && $1<=10402) || ($1>=11000 && $1<=11175) || ($1==11618) || ($1>=12100 && $1<=12111) || ($1>=12212 && $1<=12219) || ($1==12489) || ($1>=12807 && $1<=12808) || ($1==12983) || ($1>=14616 && $1<=14627) || ($1>=15723 && $1<=15897)){ print 7}
else if( ($1==521) || ($1==554) || ($1>=601 && $1<=612) || ($1>=651 && $1<=708) || ($1>=1905 && $1<=1942) || ($1>=1949 && $1<=1979) || ($1>=1987 && $1<=1993) || ($1>=2259 && $1<=2278) || ($1>=2352 && $1<=2362) || ($1>=2395 && $1<=2406) || ($1>=2412 && $1<=2449) || ($1>=2673 && $1<=2919) || ($1>=3009 && $1<=3016) || ($1>=3814 && $1<=3827) || ($1>=4126 && $1<=4131) || ($1>=4982 && $1<=4990) || ($1>=5706 && $1<=5710) || ($1>=6012 && $1<=6181) || ($1>=6285 && $1<=6339) || ($1>=6409 && $1<=6411) || ($1>=6596 && $1<=6700) || ($1>=7191 && $1<=7424) || ($1==8081) || ($1>=8550 && $1<=8551) || ($1>=8700 && $1<=8716) || ($1>=9324 && $1<=9326) || ($1>=9619 && $1<=9624) || ($1==9729) || ($1>=10018 && $1<=10064) || ($1>=10115 && $1<=10126) || ($1>=10198 && $1<=10386) || ($1==10486) || ($1>=12112 && $1<=12115) || ($1>=12209 && $1<=12211)){ print 8}
else if( ($1>=489 && $1<=498) || ($1>=505 && $1<=520) || ($1>=549 && $1<=553) || ($1>=638 && $1<=650) || ($1>=709 && $1<=1904) || ($1>=1943 && $1<=1948) || ($1>=1994 && $1<=2004) || ($1>=2064 && $1<=2090) || ($1>=2127 && $1<=2173) || ($1>=2194 && $1<=2258) || ($1>=2279 && $1<=2351) || ($1>=2363 && $1<=2372) || ($1==2393) || ($1>=2450 && $1<=2672) || ($1>=3474 && $1<=3485) || ($1>=4145 && $1<=4236) || ($1>=4890 && $1<=4915) || ($1>=5742 && $1<=6011) || ($1>=7003 && $1<=7034) || ($1>=7049 && $1<=7295) || ($1>=7425 && $1<=8080) || ($1==8084) || ($1>=8352 && $1<=8425) || ($1>=8461 && $1<=8495) || ($1>=8542 && $1<=8549) || ($1>=8717 && $1<=8856) || ($1>=9327 && $1<=9428) || ($1>=9625 && $1<=9673) || ($1>=9790 && $1<=9791) || ($1>=9793 && $1<=9801) || ($1>=9812 && $1<=9849) || ($1>=10010 && $1<=10017) || ($1>=10065 && $1<=10114) || ($1>=10128 && $1<=10130) || ($1>=10137 && $1<=10197) || ($1>=10387 && $1<=10395) || ($1>=10403 && $1<=10485) || ($1>=10978 && $1<=10999) || ($1>=11176 && $1<=11325) || ($1>=11620 && $1<=11687) || ($1>=11816 && $1<=11843) || ($1>=11939 && $1<=12099) || ($1>=12116 && $1<=12208) || ($1>=12220 && $1<=12307) || ($1>=12357 && $1<=12488) || ($1>=12598 && $1<=12806) || ($1>=12948 && $1<=12972) || ($1>=13216 && $1<=13306) || ($1>=13312 && $1<=13366) || ($1>=13415 && $1<=14615) || ($1>=14628 && $1<=15722) || ($1>=15989 && $1<=16833) || ($1>=17402 && $1<=17431)){ print 9}
}' tmp > tmp2

现在,您将所有正确的数据放在一个单独的文件中。 只需将其paste到您需要的位置,然后将其删除即可。 再次感谢社区。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM