简体   繁体   中英

Reconstruct a URL string from sys.stdin in Python

I have a script that takes input from a large log file. This file has encoded URLs. I am using standard input to grab these URLs from the file. I wish to process each URL separately.

Problem is when I get the a single URL its split up into each character in the URL. I do ''.join(something) when then after processing I get characters.

eg

for line in sys.stdin:
    line = line.strip()
    line1 = ''.join(line)

I also tried collecting all the characters in the URL and then joining. Still same result.

Sample out I get:

Input from file: " www.cnn.com" output after sys.std and processing : ['w','w','w','.','c','n','n','.','c','o','m']

the list appears because i make it so. Otherwise i get www.cnn.com from sys.stdin. But the underlying structure is same as the output.

What I want is: Input from file: " www.cnn.com" output: "www.cnn.com" (this should be one string. not strings of individual characters)

Thanks

I think your stdin input might be garbled. Consider this script:

#stdin.py
import sys
for line in sys.stdin:
    print line.strip()

Then piping input into it works as expected:

$ echo -e "www.cnn.com\nwww.test.com" | python stdin.py 
www.cnn.com
www.test.com

If you call list() on a string, it splits it up by character:

>>> list("test")
['t', 'e', 's', 't']

I'm guessing what you probably want to do is read the entire input and then split on lines, like this:

import sys
lines = sys.stdin.read().split()
print lines

Running it, I get:

$ echo -e "www.cnn.com\nwww.test.com" | python stdin.py 
['www.cnn.com', 'www.test.com']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM