簡體   English   中英

索引文件中的值

[英]Indexing values from file

我拼命想要從工作中完成一項任務,我無法弄明白。

簡短說明:我必須監視一個產生一些值的文件。 我設法將值隔離到一個單獨的文件中,我是字符串,為每個值分配一個索引。

描述:

我的給定文件(文件A)將如下所示,但有超過10000個條目:

402
506
223
123
5667
17430
9921
9232

內部的所有值基本上都是整數,范圍在103到17431之間。對於文件A中的每個單獨的數字,我必須分配一個從0到9的索引值。我的第一種方法是使用sed來從文件中簡單地替換每個字符串A具有其特定索引的給定值,但對於我的大文件花費的時間太長。 另一種推薦給我的方法是使用awk但我也失敗了。 我的腳本看起來像這樣:

awk '($0>=363 && $0<=499) || ($0>=4645 && $0<=4646) {$0="0"}1' tmp >tmp2

awk '($0>=2174 && $0<=2193)  {$0="1"}1' tmp >tmp2

awk '($0==500) || ($0>=12308 && $0<=12356) {$0="2"}1' tmp >tmp2

awk '($0>=103 && $0<=220) || ($0>=252 && $0<=299) || ($0>=1980 && $0<=1986) || ($0>=2921 && $0<=2922) {$0="3"}1' tmp >priority

awk '($0>=221 && $0<=251) || ($0>=8085 && $0<=8091) || ($0==8350) || ($0>=12809 && $0<=12945) || ($0>=16834 && $0<=17033)  {$0="4"}1' tmp >tmp2

awk '($0>=300 && $0<=362) || ($0=522) || ($0>=2923 && $0<=2925) || ($0>=3441 && $0<=3442) || ($0=4644)|| ($0>=5677 && $0<=5695) || ($0>=8082 && $0<=8083)|| ($0>=8093 && $0<=8349) || ($0>=12946 && $0<=12947) || ($0>=21986 && $0<=13215) || ($0>=13309 && $0<=13311)  {$0="5"}1' tmp >tmp2

我希望輸出像:

5
3
3
2
1
6
7
7

沒有發生。 我聲明了每個索引的范圍,我嘗試相應地替換每個值但是不起作用。 m trying to take a for循環or an if / else approach but i do not know how just because I新手。 如果有人可以幫助我一些有助於我的語法? 我試圖寫一些類似的東西:

x=value from file list.csv
for x in rage1 or range2 or range3 
 replace x with 0
for x in range 3 or range 4 or range 5
 replace x with 1

OR an if/else approach

x=values from list.csv
if x in range1 or range2 or range3 
  then replace x with 0
else if x in range4 range5 range6
  then replace x with 1

有人可以幫我這個嗎? 我正試圖以各種方式制造它(bash,pearl,python ......)所以任何想法都是受歡迎的,只要稍微解釋一下,正如我說我對此不熟悉。 謝謝。

運行perl so-57624956.pl < fileA

use 5.010;
use Set::IntSpan::Fast::XS qw();
my @intspans = map {
    Set::IntSpan::Fast::XS->new($_)
} (
    '363-499,4645-4646',
    '2174-2193',
    '500,12308-12356',
    '103-220,252-299,1980-1986,2921-2922',
    '221-251,8085-8091,8350,12809-12945,16834-17033',
    '300-362,522,2923-2925,3441-3442,4644,5677-5695,'
    . '8082-8083,8093-8349,12946-12947,12986-13215,13309-13311',
);
while (<>) {
    while (my ($index, $intspan) = each @intspans) {
        say $index if $intspan->contains($_);
    }
}

awk腳本有什么問題? 這是帶有您指定范圍的awk one liner,它按預期工作。

awk '{ if( ($1>=363 && $1<=499) || ($1>=4645 && $1<=4646)){ print 0}  
else if( ($1>=2174 && $1<=2193)) { print 1}  
else if( ($1==500) || ($1>=12308 && $1<=12356)){ print 2} 
else if( ($1>=103 && $1<=220) || ($1>=252 && $1<=299) || ($1>=1980 && $1<=1986) || ($1>=2921 && $1<=2922)){ print 3} 
else if( ($1>=221 && $1<=251) || ($1>=8085 && $1<=8091) || ($1==8350) || ($1>=12809 && $1<=12945) || ($1>=16834 && $1<=17033)){ print 4} 
else if( ($1>=300 && $1<=362) || ($1=522) || ($1>=2923 && $1<=2925) || ($1>=3441 && $1<=3442) || ($1=4644)|| ($1>=5677 && $1<=5695) || ($1>=8082 && $1<=8083)|| ($1>=8093 && $1<=8349) || ($1>=12946 && $1<=12947) || ($1>=21986 && $1<=13215) || ($1>=13309 && $1<=13311)){ print 5}
}' tmp > tmp2

謝謝J23。 你只是保住我的工作。 如果你去過倫敦,說些什么,我就給你一杯啤酒吧。 所以,我通過添加剩下的值來解決問題。 一件簡單的事情給我帶來了一些麻煩,但很容易修復(沒有抱怨或任何事情)。

($1=522)

一定是

($1==522)

現在,如果有人需要做與我的任務類似的事情,監視csv並通過為某些值添加索引來相應地更改數據,那就做我在該社區的幫助下所做的事情。

##print out your column from the target file. Just replace "NR" with your column number
csvtool col "NR" /path.to/the/file.csv >tmp 
## Use AWK to look for the range and then act accordingly by replacing your value with the correct index. 
awk '{ 
if( ($1>=363 && $1<=499) || ($1>=4645 && $1<=4646)){ print 0}  
else if( ($1>=2174 && $1<=2193)) { print 1}  
else if( ($1==500) || ($1>=12308 && $1<=12356)){ print 2} 
else if( ($1>=103 && $1<=220) || ($1>=252 && $1<=299) || ($1>=1980 && $1<=1986) || ($1>=2921 && $1<=2922)){ print 3} 
else if( ($1>=221 && $1<=251) || ($1>=8085 && $1<=8091) || ($1==8350) || ($1>=12809 && $1<=12945) || ($1>=16834 && $1<=17033)){ print 4} 
else if( ($1>=300 && $1<=362) || ($1==522) || ($1>=2923 && $1<=2925) || ($1>=3441 && $1<=3442) || ($1==4644)|| ($1>=5677 && $1<=5695) || ($1>=8082 && $1<=8083)|| ($1>=8093 && $1<=8349) || ($1>=12946 && $1<=12947) || ($1>=21986 && $1<=13215) || ($1>=13309 && $1<=13311)){ print 5}
else if( ($1>=501 && $1<=504) || ($1>=566 && $1<=600) || ($1>=613 && $1<=637) ||  ($1>=2015 && $1<=2040) ||  ($1>=2103 && $1<=2126) || ($1>=2373 && $1<=2374) || ($1>=3828 && $1<=4125) || ($1>=4237 && $1<=4636) || ($1>=4647 && $1<=4889) || ($1>=4991 && $1<=5676) || ($1>=5696 && $1<=5705) || ($1>=6502 && $1<=6595) || ($1>=8429 && $1<=8460) || ($1>=8552 && $1<=8699) || ($1>=10487 && $1<=10977) || ($1>=11326 && $1<=11617) || ($1>=11688 && $1<=11815) || ($1>=11844 && $1<=11938) || ($1>=12490 && $1<=12597) || ($1>=12973 && $1<=12982) || ($1>=13367 && $1<=13414)){ print 6}
else if( ($1>=523 && $1<=548) || ($1>=555 && $1<=565) || ($1>=2005 && $1<=2014) || ($1>=2041 && $1<=2063) || ($1>=2091 && $1<=2102) ||  ($1==2394) || ($1>=2407 && $1<=2411) || ($1>=2926 && $1<=3008) || ($1>=3443 && $1<=3473) || ($1>=3486 && $1<=3813) || ($1>=4132 && $1<=4144) || ($1>=4637 && $1<=4643) || ($1>=4916 && $1<=4981) || ($1>=5711 && $1<=5741) || ($1>=6403 && $1<=6405) || ($1>=6415 && $1<=6466) || ($1>=6701 && $1<=7002) || ($1>=7035 && $1<=7048) || ($1>=8426 && $1<=8428) || ($1>=8496 && $1<=8541) || ($1>=8857 && $1<=9323) || ($1>=9429 && $1<=9618) || ($1>=9674 && $1<=9789) || ($1>=9802 && $1<=9811) || ($1>=9850 && $1<=10009) || ($1>=10131 && $1<=10136) || ($1>=10396 && $1<=10402) || ($1>=11000 && $1<=11175) || ($1==11618) || ($1>=12100 && $1<=12111) || ($1>=12212 && $1<=12219) || ($1==12489) || ($1>=12807 && $1<=12808) || ($1==12983) || ($1>=14616 && $1<=14627) || ($1>=15723 && $1<=15897)){ print 7}
else if( ($1==521) || ($1==554) || ($1>=601 && $1<=612) || ($1>=651 && $1<=708) || ($1>=1905 && $1<=1942) || ($1>=1949 && $1<=1979) || ($1>=1987 && $1<=1993) || ($1>=2259 && $1<=2278) || ($1>=2352 && $1<=2362) || ($1>=2395 && $1<=2406) || ($1>=2412 && $1<=2449) || ($1>=2673 && $1<=2919) || ($1>=3009 && $1<=3016) || ($1>=3814 && $1<=3827) || ($1>=4126 && $1<=4131) || ($1>=4982 && $1<=4990) || ($1>=5706 && $1<=5710) || ($1>=6012 && $1<=6181) || ($1>=6285 && $1<=6339) || ($1>=6409 && $1<=6411) || ($1>=6596 && $1<=6700) || ($1>=7191 && $1<=7424) || ($1==8081) || ($1>=8550 && $1<=8551) || ($1>=8700 && $1<=8716) || ($1>=9324 && $1<=9326) || ($1>=9619 && $1<=9624) || ($1==9729) || ($1>=10018 && $1<=10064) || ($1>=10115 && $1<=10126) || ($1>=10198 && $1<=10386) || ($1==10486) || ($1>=12112 && $1<=12115) || ($1>=12209 && $1<=12211)){ print 8}
else if( ($1>=489 && $1<=498) || ($1>=505 && $1<=520) || ($1>=549 && $1<=553) || ($1>=638 && $1<=650) || ($1>=709 && $1<=1904) || ($1>=1943 && $1<=1948) || ($1>=1994 && $1<=2004) || ($1>=2064 && $1<=2090) || ($1>=2127 && $1<=2173) || ($1>=2194 && $1<=2258) || ($1>=2279 && $1<=2351) || ($1>=2363 && $1<=2372) || ($1==2393) || ($1>=2450 && $1<=2672) || ($1>=3474 && $1<=3485) || ($1>=4145 && $1<=4236) || ($1>=4890 && $1<=4915) || ($1>=5742 && $1<=6011) || ($1>=7003 && $1<=7034) || ($1>=7049 && $1<=7295) || ($1>=7425 && $1<=8080) || ($1==8084) || ($1>=8352 && $1<=8425) || ($1>=8461 && $1<=8495) || ($1>=8542 && $1<=8549) || ($1>=8717 && $1<=8856) || ($1>=9327 && $1<=9428) || ($1>=9625 && $1<=9673) || ($1>=9790 && $1<=9791) || ($1>=9793 && $1<=9801) || ($1>=9812 && $1<=9849) || ($1>=10010 && $1<=10017) || ($1>=10065 && $1<=10114) || ($1>=10128 && $1<=10130) || ($1>=10137 && $1<=10197) || ($1>=10387 && $1<=10395) || ($1>=10403 && $1<=10485) || ($1>=10978 && $1<=10999) || ($1>=11176 && $1<=11325) || ($1>=11620 && $1<=11687) || ($1>=11816 && $1<=11843) || ($1>=11939 && $1<=12099) || ($1>=12116 && $1<=12208) || ($1>=12220 && $1<=12307) || ($1>=12357 && $1<=12488) || ($1>=12598 && $1<=12806) || ($1>=12948 && $1<=12972) || ($1>=13216 && $1<=13306) || ($1>=13312 && $1<=13366) || ($1>=13415 && $1<=14615) || ($1>=14628 && $1<=15722) || ($1>=15989 && $1<=16833) || ($1>=17402 && $1<=17431)){ print 9}
}' tmp > tmp2

現在,您將所有正確的數據放在一個單獨的文件中。 只需將其paste到您需要的位置,然后將其刪除即可。 再次感謝社區。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM