简体   繁体   中英

How to GREP words, not lines, that contain specific characters, and print entire word

I have a file with tons of lines and words such as this example:

C742 C743 C744 C745 C835 C836 C837 C838 C839 C840 C841 C842 C843 C844 C845 C935 C936 C937 C938 C939 C940 C941 C942 C943 C944 C945 C1035 C1036 C1037 C1038 C1039 C1040 C1041 C1042 C1043 C1044 C1045 D135 D136 D137 D138 D139 D140 D141 D142 D143 D144 D145 D235 D236 D237 D238 D239 D240 D241 D242 D243 D244 D245 D335 D336 D337 D338 D339 D340 D341 D342 D343 D344 D345 D435 D436 D437 D438 D439 D440 D441 D442 D443 D444

What I want to do is list only the word (assuming each 4 character bundle is a word) that contains a specific number, such as 35 .

In this example, I would want the result printed to be:

C835
C935
C1035
D135
D235
D335
D435

I have tried a few different ways such as using grep only to find either the entire line that contains a 35 gets printed, or grep -o 35 only the 35 gets printed and I do not know what the prefix of that number was.

Try the following bash script:

cat words.txt | tr " " "\n" | grep 35

Explanation:

cat reads words.txt and it spits them out to STDOUT, that gets piped into tr which means "translate": In this case from space (" ") to newline ("\\n"), then, grep just does its default line-by-line behaviour and searches for anything containing 35.

Try this

for word in `cat filename`; do
    echo $word | grep 35
done

There is a standard grep solution (\\S for non-whitespace character)

$ grep -o '\S*35\S*' words.txt
C835                                                                                                                  
C935                                                                                                                  
C1035                                                                                                                 
D135                                                                                                                  
D235                                                                                                                  
D335                                                                                                                  
D435  

You can expand your regular expression to match all the groups, but it's a little more messy:

grep -o "[^ ]*35[^ ]*" words.txt

The [^ ]* part of the above will match any non-space character.

Python:

import re
s = "C742 C743 C744 C745 C835 C836 C837 C838 C839 C840 C841 C842 C843 C844 C845 C935 C936 C937 C938 C939 C940 C941 C942 C943 C944 C945 C1035 C1036 C1037 C1038 C1039 C1040 C1041 C1042 C1043 C1044 C1045 D135 D136 D137 D138 D139 D140 D141 D142 D143 D144 D145 D235 D236 D237 D238 D239 D240 D241 D242 D243 D244 D245 D335 D336 D337 D338 D339 D340 D341 D342 D343 D344 D345 D435 D436 D437 D438 D439    D440 D441 D442 D443 D444"
print(re.findall(r'[A-Z0-9]*35[0-9]*',s)) # assuming '35' can appear anywhere in the number.

Output:

['C835', 'C935', 'C1035', 'D135', 'D235', 'D335', 'D435']

You can read a whole file with:

with open('words.txt') as f:
    s = f.read()

If you'd also use Python to do it:

>>> with open('file') as f:
...     print('\n'.join(i for i in f.read().split() if '35' in i))
...     
... 
C835
C935
C1035
D135
D235
D335
D435

Here, f.read() gets the content of the file and return the content as a string object. str.split() split the string by spaces and returns a list.

However, (i for i in f.read().split() if '35' in i) is generator comprehension in Python, which returns a generator and it only has '35' in elements. So we can use that to gets the expect output (no need regex in this case).

Finally, use '\\n'.join() to print them out. You can also use a for loop instead:

>>> with open('file') as f:
...     for i in (i for i in f.read().split() if '35' in i):
...         print(i)
...         
...     
... 
C835
C935
C1035
D135
D235
D335
D435

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM