简体   繁体   English

索引文件中的值

[英]Indexing values from file

I'm desperately trying to achieve a task from work and i just can not figure it out. 我拼命想要从工作中完成一项任务,我无法弄明白。

Short brief: I must monitor a file, which produce some values. 简短说明:我必须监视一个产生一些值的文件。 I manage to isolate the values into a separate file and I'm string to allocate an index for each value. 我设法将值隔离到一个单独的文件中,我是字符串,为每个值分配一个索引。

Description: 描述:

My given file (file A) will look like below, but with more then 10000 entries: 我的给定文件(文件A)将如下所示,但有超过10000个条目:

402
506
223
123
5667
17430
9921
9232

All the values inside are basically integers numbers with a range between 103 and 17431. For each individual number from File A i must allocate an index value from 0 to 9. My first approach was to use sed in order to literately replace each string from file A with the given value of its specific index, but is taking too long for my large file. 内部的所有值基本上都是整数,范围在103到17431之间。对于文件A中的每个单独的数字,我必须分配一个从0到9的索引值。我的第一种方法是使用sed来从文件中简单地替换每个字符串A具有其特定索引的给定值,但对于我的大文件花费的时间太长。 Another approach which was recommended to me was to use awk but I fail with that as well. 另一种推荐给我的方法是使用awk但我也失败了。 My script was looking like this: 我的脚本看起来像这样:

awk '($0>=363 && $0<=499) || ($0>=4645 && $0<=4646) {$0="0"}1' tmp >tmp2

awk '($0>=2174 && $0<=2193)  {$0="1"}1' tmp >tmp2

awk '($0==500) || ($0>=12308 && $0<=12356) {$0="2"}1' tmp >tmp2

awk '($0>=103 && $0<=220) || ($0>=252 && $0<=299) || ($0>=1980 && $0<=1986) || ($0>=2921 && $0<=2922) {$0="3"}1' tmp >priority

awk '($0>=221 && $0<=251) || ($0>=8085 && $0<=8091) || ($0==8350) || ($0>=12809 && $0<=12945) || ($0>=16834 && $0<=17033)  {$0="4"}1' tmp >tmp2

awk '($0>=300 && $0<=362) || ($0=522) || ($0>=2923 && $0<=2925) || ($0>=3441 && $0<=3442) || ($0=4644)|| ($0>=5677 && $0<=5695) || ($0>=8082 && $0<=8083)|| ($0>=8093 && $0<=8349) || ($0>=12946 && $0<=12947) || ($0>=21986 && $0<=13215) || ($0>=13309 && $0<=13311)  {$0="5"}1' tmp >tmp2

I was hoping for an output like: 我希望输出像:

5
3
3
2
1
6
7
7

Is not happening. 没有发生。 I declare the ranges for each index and I try to replace each value accordingly but is not working. 我声明了每个索引的范围,我尝试相应地替换每个值但是不起作用。 I m trying to take a for loop or an if/else approach but i do not know how just because I m new to this. m trying to take a for循环or an if / else approach but i do not know how just because I新手。 If somebody can help me with some syntax that will help me? 如果有人可以帮助我一些有助于我的语法? I was trying to write something similar with: 我试图写一些类似的东西:

x=value from file list.csv
for x in rage1 or range2 or range3 
 replace x with 0
for x in range 3 or range 4 or range 5
 replace x with 1

OR an if/else approach

x=values from list.csv
if x in range1 or range2 or range3 
  then replace x with 0
else if x in range4 range5 range6
  then replace x with 1

Can somebody help me with this? 有人可以帮我这个吗? I'm trying to make it in every way that I can (bash ,pearl,python...) so any idea is welcome as long as is a bit explained, as I say I`m new to this. 我正试图以各种方式制造它(bash,pearl,python ......)所以任何想法都是受欢迎的,只要稍微解释一下,正如我说我对此不熟悉。 Thank you. 谢谢。

Run with perl so-57624956.pl < fileA 运行perl so-57624956.pl < fileA

use 5.010;
use Set::IntSpan::Fast::XS qw();
my @intspans = map {
    Set::IntSpan::Fast::XS->new($_)
} (
    '363-499,4645-4646',
    '2174-2193',
    '500,12308-12356',
    '103-220,252-299,1980-1986,2921-2922',
    '221-251,8085-8091,8350,12809-12945,16834-17033',
    '300-362,522,2923-2925,3441-3442,4644,5677-5695,'
    . '8082-8083,8093-8349,12946-12947,12986-13215,13309-13311',
);
while (<>) {
    while (my ($index, $intspan) = each @intspans) {
        say $index if $intspan->contains($_);
    }
}

What was the issue with the awk script? awk脚本有什么问题? Here is the awk one liner with the ranges you specified and it works as expected. 这是带有您指定范围的awk one liner,它按预期工作。

awk '{ if( ($1>=363 && $1<=499) || ($1>=4645 && $1<=4646)){ print 0}  
else if( ($1>=2174 && $1<=2193)) { print 1}  
else if( ($1==500) || ($1>=12308 && $1<=12356)){ print 2} 
else if( ($1>=103 && $1<=220) || ($1>=252 && $1<=299) || ($1>=1980 && $1<=1986) || ($1>=2921 && $1<=2922)){ print 3} 
else if( ($1>=221 && $1<=251) || ($1>=8085 && $1<=8091) || ($1==8350) || ($1>=12809 && $1<=12945) || ($1>=16834 && $1<=17033)){ print 4} 
else if( ($1>=300 && $1<=362) || ($1=522) || ($1>=2923 && $1<=2925) || ($1>=3441 && $1<=3442) || ($1=4644)|| ($1>=5677 && $1<=5695) || ($1>=8082 && $1<=8083)|| ($1>=8093 && $1<=8349) || ($1>=12946 && $1<=12947) || ($1>=21986 && $1<=13215) || ($1>=13309 && $1<=13311)){ print 5}
}' tmp > tmp2

THANK YOU J23. 谢谢J23。 You just save my job. 你只是保住我的工作。 If you ever visit London, say something, I own you a beer. 如果你去过伦敦,说些什么,我就给你一杯啤酒吧。 So, I solved the problem by your instructions, just by adding my rest of values. 所以,我通过添加剩下的值来解决问题。 One simple thing give me some trouble but it was easy to fix (no complain or anything). 一件简单的事情给我带来了一些麻烦,但很容易修复(没有抱怨或任何事情)。

($1=522)

must be 一定是

($1==522)

Now, if somebody will ever need to do something similar with my task, to monitor a csv and change data accordingly by adding an index to some values, just do what I did with the help of this community. 现在,如果有人需要做与我的任务类似的事情,监视csv并通过为某些值添加索引来相应地更改数据,那就做我在该社区的帮助下所做的事情。

##print out your column from the target file. Just replace "NR" with your column number
csvtool col "NR" /path.to/the/file.csv >tmp 
## Use AWK to look for the range and then act accordingly by replacing your value with the correct index. 
awk '{ 
if( ($1>=363 && $1<=499) || ($1>=4645 && $1<=4646)){ print 0}  
else if( ($1>=2174 && $1<=2193)) { print 1}  
else if( ($1==500) || ($1>=12308 && $1<=12356)){ print 2} 
else if( ($1>=103 && $1<=220) || ($1>=252 && $1<=299) || ($1>=1980 && $1<=1986) || ($1>=2921 && $1<=2922)){ print 3} 
else if( ($1>=221 && $1<=251) || ($1>=8085 && $1<=8091) || ($1==8350) || ($1>=12809 && $1<=12945) || ($1>=16834 && $1<=17033)){ print 4} 
else if( ($1>=300 && $1<=362) || ($1==522) || ($1>=2923 && $1<=2925) || ($1>=3441 && $1<=3442) || ($1==4644)|| ($1>=5677 && $1<=5695) || ($1>=8082 && $1<=8083)|| ($1>=8093 && $1<=8349) || ($1>=12946 && $1<=12947) || ($1>=21986 && $1<=13215) || ($1>=13309 && $1<=13311)){ print 5}
else if( ($1>=501 && $1<=504) || ($1>=566 && $1<=600) || ($1>=613 && $1<=637) ||  ($1>=2015 && $1<=2040) ||  ($1>=2103 && $1<=2126) || ($1>=2373 && $1<=2374) || ($1>=3828 && $1<=4125) || ($1>=4237 && $1<=4636) || ($1>=4647 && $1<=4889) || ($1>=4991 && $1<=5676) || ($1>=5696 && $1<=5705) || ($1>=6502 && $1<=6595) || ($1>=8429 && $1<=8460) || ($1>=8552 && $1<=8699) || ($1>=10487 && $1<=10977) || ($1>=11326 && $1<=11617) || ($1>=11688 && $1<=11815) || ($1>=11844 && $1<=11938) || ($1>=12490 && $1<=12597) || ($1>=12973 && $1<=12982) || ($1>=13367 && $1<=13414)){ print 6}
else if( ($1>=523 && $1<=548) || ($1>=555 && $1<=565) || ($1>=2005 && $1<=2014) || ($1>=2041 && $1<=2063) || ($1>=2091 && $1<=2102) ||  ($1==2394) || ($1>=2407 && $1<=2411) || ($1>=2926 && $1<=3008) || ($1>=3443 && $1<=3473) || ($1>=3486 && $1<=3813) || ($1>=4132 && $1<=4144) || ($1>=4637 && $1<=4643) || ($1>=4916 && $1<=4981) || ($1>=5711 && $1<=5741) || ($1>=6403 && $1<=6405) || ($1>=6415 && $1<=6466) || ($1>=6701 && $1<=7002) || ($1>=7035 && $1<=7048) || ($1>=8426 && $1<=8428) || ($1>=8496 && $1<=8541) || ($1>=8857 && $1<=9323) || ($1>=9429 && $1<=9618) || ($1>=9674 && $1<=9789) || ($1>=9802 && $1<=9811) || ($1>=9850 && $1<=10009) || ($1>=10131 && $1<=10136) || ($1>=10396 && $1<=10402) || ($1>=11000 && $1<=11175) || ($1==11618) || ($1>=12100 && $1<=12111) || ($1>=12212 && $1<=12219) || ($1==12489) || ($1>=12807 && $1<=12808) || ($1==12983) || ($1>=14616 && $1<=14627) || ($1>=15723 && $1<=15897)){ print 7}
else if( ($1==521) || ($1==554) || ($1>=601 && $1<=612) || ($1>=651 && $1<=708) || ($1>=1905 && $1<=1942) || ($1>=1949 && $1<=1979) || ($1>=1987 && $1<=1993) || ($1>=2259 && $1<=2278) || ($1>=2352 && $1<=2362) || ($1>=2395 && $1<=2406) || ($1>=2412 && $1<=2449) || ($1>=2673 && $1<=2919) || ($1>=3009 && $1<=3016) || ($1>=3814 && $1<=3827) || ($1>=4126 && $1<=4131) || ($1>=4982 && $1<=4990) || ($1>=5706 && $1<=5710) || ($1>=6012 && $1<=6181) || ($1>=6285 && $1<=6339) || ($1>=6409 && $1<=6411) || ($1>=6596 && $1<=6700) || ($1>=7191 && $1<=7424) || ($1==8081) || ($1>=8550 && $1<=8551) || ($1>=8700 && $1<=8716) || ($1>=9324 && $1<=9326) || ($1>=9619 && $1<=9624) || ($1==9729) || ($1>=10018 && $1<=10064) || ($1>=10115 && $1<=10126) || ($1>=10198 && $1<=10386) || ($1==10486) || ($1>=12112 && $1<=12115) || ($1>=12209 && $1<=12211)){ print 8}
else if( ($1>=489 && $1<=498) || ($1>=505 && $1<=520) || ($1>=549 && $1<=553) || ($1>=638 && $1<=650) || ($1>=709 && $1<=1904) || ($1>=1943 && $1<=1948) || ($1>=1994 && $1<=2004) || ($1>=2064 && $1<=2090) || ($1>=2127 && $1<=2173) || ($1>=2194 && $1<=2258) || ($1>=2279 && $1<=2351) || ($1>=2363 && $1<=2372) || ($1==2393) || ($1>=2450 && $1<=2672) || ($1>=3474 && $1<=3485) || ($1>=4145 && $1<=4236) || ($1>=4890 && $1<=4915) || ($1>=5742 && $1<=6011) || ($1>=7003 && $1<=7034) || ($1>=7049 && $1<=7295) || ($1>=7425 && $1<=8080) || ($1==8084) || ($1>=8352 && $1<=8425) || ($1>=8461 && $1<=8495) || ($1>=8542 && $1<=8549) || ($1>=8717 && $1<=8856) || ($1>=9327 && $1<=9428) || ($1>=9625 && $1<=9673) || ($1>=9790 && $1<=9791) || ($1>=9793 && $1<=9801) || ($1>=9812 && $1<=9849) || ($1>=10010 && $1<=10017) || ($1>=10065 && $1<=10114) || ($1>=10128 && $1<=10130) || ($1>=10137 && $1<=10197) || ($1>=10387 && $1<=10395) || ($1>=10403 && $1<=10485) || ($1>=10978 && $1<=10999) || ($1>=11176 && $1<=11325) || ($1>=11620 && $1<=11687) || ($1>=11816 && $1<=11843) || ($1>=11939 && $1<=12099) || ($1>=12116 && $1<=12208) || ($1>=12220 && $1<=12307) || ($1>=12357 && $1<=12488) || ($1>=12598 && $1<=12806) || ($1>=12948 && $1<=12972) || ($1>=13216 && $1<=13306) || ($1>=13312 && $1<=13366) || ($1>=13415 && $1<=14615) || ($1>=14628 && $1<=15722) || ($1>=15989 && $1<=16833) || ($1>=17402 && $1<=17431)){ print 9}
}' tmp > tmp2

Now you got all your correct data in a separate file. 现在,您将所有正确的数据放在一个单独的文件中。 Just paste it where you need it and remove it after. 只需将其paste到您需要的位置,然后将其删除即可。 Thank you once again to the community. 再次感谢社区。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM