简体   繁体   中英

Why is line.split('\t')[1] not equal to 0?

I have numerous tsv file containing two columns. First column is made up ofsentences and second column is made of polarity of those sentences. the delimiter is a tabulation. I would like to extract the lines which have a polarity of "0".

I made up this small code but whatsoever it does not work and return 0 sentences.

    for d in directory:
        print(" directory: ", d)
        splits = ['dev1'] #,'test1','train1']

        for s in splits:

            print(" sous-dir : ", s)
            path = os.path.join(indir, d)
            with open(os.path.join(path, s+'.tsv'), 'r', encoding='utf-8') as f_in:
              next(f_in)
              for line in f_in:
                if line.split('\t')[1] == 0:
                  doc = nlp(line.split('\t')[0])

                  line_split = [sent.text for sent in doc.sents]

                  for elt in line_split:
                    sentences_list.append(elt)


    print("nombres total de phrases :", len(sentences_list))


Why is line.split('\t')[1] not equal to 0 if line is the string "Je suis levant\t0\n"

ex. of a file

gnfjfklfklf  0
fokgmlmlrfm  1
eoklplrmrml  0
ekemlremeùe  0

I would like to keep line which end with "0"

After splitting you need to strip the string in order to remove the garbage that IO puts in there, such as line breaks, other tabs etc. For that Python has a .strip() function.

You're also doing a comparison between String and Integer , so in order for it to not fail with a type error, you must either change the code to compare strings or convert the result from file to Integer with int() .

Condition could be rewritten as:

if int(line.split('\t')[1].strip()) == 0:

or as:

if line.split('\t')[1].strip() == "0":

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM