I have a file with tons of lines and words such as this example:
C742 C743 C744 C745 C835 C836 C837 C838 C839 C840 C841 C842 C843 C844 C845 C935 C936 C937 C938 C939 C940 C941 C942 C943 C944 C945 C1035 C1036 C1037 C1038 C1039 C1040 C1041 C1042 C1043 C1044 C1045 D135 D136 D137 D138 D139 D140 D141 D142 D143 D144 D145 D235 D236 D237 D238 D239 D240 D241 D242 D243 D244 D245 D335 D336 D337 D338 D339 D340 D341 D342 D343 D344 D345 D435 D436 D437 D438 D439 D440 D441 D442 D443 D444
What I want to do is list only the word (assuming each 4 character bundle is a word) that contains a specific number, such as 35
.
In this example, I would want the result printed to be:
C835
C935
C1035
D135
D235
D335
D435
I have tried a few different ways such as using grep only to find either the entire line that contains a 35
gets printed, or grep -o 35
only the 35
gets printed and I do not know what the prefix of that number was.
Try the following bash script:
cat words.txt | tr " " "\n" | grep 35
cat
reads words.txt and it spits them out to STDOUT, that gets piped into tr
which means "translate": In this case from space (" ") to newline ("\\n"), then, grep
just does its default line-by-line behaviour and searches for anything containing 35.
Try this
for word in `cat filename`; do
echo $word | grep 35
done
There is a standard grep solution (\\S for non-whitespace character)
$ grep -o '\S*35\S*' words.txt
C835
C935
C1035
D135
D235
D335
D435
You can expand your regular expression to match all the groups, but it's a little more messy:
grep -o "[^ ]*35[^ ]*" words.txt
The [^ ]*
part of the above will match any non-space character.
Python:
import re
s = "C742 C743 C744 C745 C835 C836 C837 C838 C839 C840 C841 C842 C843 C844 C845 C935 C936 C937 C938 C939 C940 C941 C942 C943 C944 C945 C1035 C1036 C1037 C1038 C1039 C1040 C1041 C1042 C1043 C1044 C1045 D135 D136 D137 D138 D139 D140 D141 D142 D143 D144 D145 D235 D236 D237 D238 D239 D240 D241 D242 D243 D244 D245 D335 D336 D337 D338 D339 D340 D341 D342 D343 D344 D345 D435 D436 D437 D438 D439 D440 D441 D442 D443 D444"
print(re.findall(r'[A-Z0-9]*35[0-9]*',s)) # assuming '35' can appear anywhere in the number.
Output:
['C835', 'C935', 'C1035', 'D135', 'D235', 'D335', 'D435']
You can read a whole file with:
with open('words.txt') as f:
s = f.read()
If you'd also use Python to do it:
>>> with open('file') as f:
... print('\n'.join(i for i in f.read().split() if '35' in i))
...
...
C835
C935
C1035
D135
D235
D335
D435
Here, f.read()
gets the content of the file and return the content as a string object. str.split()
split the string by spaces and returns a list.
However, (i for i in f.read().split() if '35' in i)
is generator comprehension in Python, which returns a generator and it only has '35'
in elements. So we can use that to gets the expect output (no need regex in this case).
Finally, use '\\n'.join()
to print them out. You can also use a for
loop instead:
>>> with open('file') as f:
... for i in (i for i in f.read().split() if '35' in i):
... print(i)
...
...
...
C835
C935
C1035
D135
D235
D335
D435
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.