简体   繁体   中英

wget creating empty file

I wrote a program in bash (which calls another program in Python) to pull information from http://www.wsj.com/mdc/public/page/2_3021-activnyse-actives.html into an .html file, which then converts it into .xhtml and then .csv. This runs through a loop so that it repeats the process every minute or so for an hour. Below is the bash code:

#!/bin/bash

n=0
while [ $n -lt 60 ]
do
    filename="$(date +"%Y-%m-%d-%H-%M-%S")"
    wget -O - http://www.wsj.com/mdc/public/page/2_3021-activnyse-actives.html > "$filename.html"
    java -jar tagsoup-1.2.1.jar --files "$filename.html"
    python xhtmlToCsv.py "$filename.xhtml" > "$filename.csv"
    ((n++))
    sleep 60
done

And here is the Python program it calls:

import sys
import xml.dom.minidom

document = xml.dom.minidom.parse(sys.argv[1])
tableElements = document.getElementsByTagName('table')

print "exchange,symbol,company,volume,price,change"
lines = tableElements[2].getElementsByTagName('td')
n = 0
data = [None] * 6

for i in lines:
    if n % 6 == 1:
        del data[:]
        data = [None] * 6

    for node in i.childNodes:
        if n % 6 + 1 < 6:
            data[n%6+1] = node.nodeValue
            if n%6+1 == 3:
                data[n%6+1] = data[n%6+1].replace(",", "")

    for items in i.getElementsByTagName('a'):
        j = i.getElementsByTagName('a')[0]
        for node in j.childNodes:
            data[0] = 'NYSE'
            data[1] = node.nodeValue[node.nodeValue.index('(')+1:node.nodeValue.index(')')]
            data[2] = node.nodeValue[0:node.nodeValue.index(" (")]

    if n % 6 == 5 and n > 6:
        print data[0] + "," + data[1] + "," + data[2] + "," + data[3] + "," + data[4] + "," + data[5]   
    n+=1

What I don't get, though, is why approximately every third .html file generated by the code returns an empty file. Is there something wrong with the code, or is it just my connection? If it is just the connection, is there a way I can throw out the empty file and try again?

Update: I figured it out. All I had to do was do a line count for the resulting .csv file, and if it was only 1, that would indicate no data got transferred, and so the files for that iteration would be thrown out.

a=($(wc $filename.csv))
x=${a[0]}
if [ $x -eq 1 ]
then
    rm $filename.html
    rm $filename.xhtml
    rm $filename.csv
else
    ((n++))
    sleep 60
fi

Thanks everyone for your input.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM