简体   繁体   中英

f.readline versus f.read print output

I am new to Python (using Python 3.6). I have a read.txt file containing information about a firm. The file starts with different report characteristics

CONFORMED PERIOD REPORT:             20120928 #this is 1 line
DATE OF REPORT:                      20121128 #this is another line

and then starts all the text about the firm..... #lots of lines here

I am trying to extract both dates (['20120928','20121128']) as well as some strings that are in the text (ie if the string exists, then I want a '1'). Ultimately, I want a vector giving me both dates + the 1s and 0s of different strings, that is, something like: ['20120928','20121128','1','0']. My code is the following:

exemptions = [] #vector I want

with open('read.txt', 'r') as f:
    line2 = f.read()  # read the txt file
    for line in f:
        if "CONFORMED PERIOD REPORT" in line:
            exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", ""))  # add line without stating CONFORMED PERIOD REPORT, just with the date)
        elif "DATE OF REPORT" in line:
            exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above

    var1 = re.findall("string1", line2, re.I)  # find string1 in line2, case-insensitive
    if len(var1) > 0:  # if the string appears, it will have length>0
        exemptions.append('1')
    else:
        exemptions.append('0')
    var2 = re.findall("string2", line2, re.I)
    if len(var2) > 0:
        exemptions.append('1')
    else:
        exemptions.append('0')

print(exemptions)

If I run this code, I obtain ['1','0'], omitting the dates and giving correct reads of the file, var1 exists (ok '1') and var2 does not (ok '0'). What I don't understand is why it doesn't report the dates. Importantly, when I change line2 to "line2=f.readline()", then I obtain ['20120928','20121128','0','0']. Ok with the dates now, but I know that var1 exists, it seems it doesn't read the rest of the file? If I omit "line2=f.read()", it spits out a vector of 0s for each line, except for my desired output. How can I omit these 0s?

My desired output would be: ['20120928','20121128','1','0']

Sorry for bothering. Thank you anyway!

The line f.read() will read the entire file into the variable line2 . If you want to read line by line you could skip the f.read() all together and just iterate like so

with open('read.txt', 'r') as f:
    for line in f:

Otherwise as written, after you .read() into line2 there is no more text to read out of f as it is all contained in the line2 variable.

line2 = f.read()整个文件读入line2 ,因此for line in f: loop中没有什么可以读取你的for line in f:

The way I went through it was finally the following:

exemptions = [] #vector I want

with open('read.txt', 'r') as f:
    line2 = "" # create an empty string variable out of the "for line" loop
    for line in f:
        line2 = line2 + line #append each line to the above created empty string
        if "CONFORMED PERIOD REPORT" in line:
            exemptions.append(line.strip('\n').replace("CONFORMED PERIOD REPORT:\t", ""))  # add line without stating CONFORMED PERIOD REPORT, just with the date)
        elif "DATE OF REPORT" in line:
            exemptions.append(line.rstrip('\n').replace("DATE OF REPORT:\t", "")) # idem above

    var1 = re.findall("string1", line2, re.I)  # find string1 in line2, case-insensitive
    if len(var1) > 0:  # if the string appears, it will have length>0
        exemptions.append('1')
    else:
        exemptions.append('0')
    var2 = re.findall("string2", line2, re.I)
    if len(var2) > 0:
        exemptions.append('1')
    else:
        exemptions.append('0')

print(exemptions)

So far this is what I got. It worked for me, although I guess working with beautifulsoup would increase the efficiency of the code. Next step :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM