python check a list with another list

Question

I have two files:

file A:
U1
U2
U3
file B:
U1hjg 444 77 AGT
U8jha 777 33 AKS
U2jsj 772 00 AKD
U55sks 888 02 SJD
U3jsj 666 32 JSJ

Then I have two lists:

listA=open("file A").readlines()
listB=open("file B").readlines()

And I would like to check for each member of listA if it is present in List B, and print two files: one with the file B with matches (ordered by listA), and the other one with fileB without matches. Desired output:

file list_match:
U1hjg 444 77 AGT
U2jsj 772 00 AKD
U3jsj 666 32 JSJ

file list_unmatched:
U8jha 777 33 AKS

U55sks 888 02 SJD

I am a very beginner so I started trying this as an example:

print(ListA[1])
print(ListB[2])

if ListA[1] in ListB[2]:
    print("yes")

And the output is:

U2
U2jsj 772 00 AKD

But the "yes" is not printed

But if I do:

if "U2" in ListB[2]:
    print("yes")

The output is:

yes

I do not understand where the error is. Could someone plese help me?

Answer 1

st = set(list_b)

matches  = ([line for line in list_a if line  in st])

To get both:

# with will close your file automatically
with open("file A") as f1 ,open("file B")  as f2:
    st = set(f2) # get set of all lines in file b
    matches = []
    diff = []
    for line in f1: # iterate over every line in file a
        if line in st: # if line is in file b add it to our matches
            matches.append(line)
        else: # else add it to our diff list
            diff.append(line)

If you want to create two new files instead of appending to lists just write the lines.

with open("file A") as f1,open("file B") as f2,open("matches.txt","w") as mat ,open("diff.txt","w") as diff:
    st = set(f1) 
    for line in f2:
        if line in st:
            mat.write(line)
        else:
            diff.write(line)

You just need ListA[1].rstrip() in ListB[2] in your own example. There is a newline character at the end of ListA[1] and all lines excluding the last. If you print(repr(ListA[1])) you will see exactly what is there.

Printing our set and each line as we iterate you can see the newlines at the end:

{'U2\n', 'U3', 'U1\n'} <-st
#  print(repr(line)) on all lines from fileB
'file B:\n'
'U1hjg 444 77 AGT\n'
'U8jha 777 33 AKS\n'
'U2jsj 772 00 AKD\n'
'U55sks 888 02 SJD\n'
'U3jsj 666 32 JSJ'

Answer 2

This happens because readlines() gives you the line with the \\n terminating character. Hence, when you do

if ListA[1] in ListB[2]:
    print("yes")

you are essentially checking if "U2\\n" is in "U2jsj 772 00 AKD\\n" , which returns False. But since "U2" is in fact present, it prints "yes" when you use the literal.

You can verify the same in sample program below:

$ cat text.txt 
Sample
Text
Here.
$ cat test.py
with open("text.txt", "r") as f:
    text = f.readlines()
    print text
    print text[0]
$ python test.py
['Sample\n', 'Text\n', 'Here.\n']
Sample

$ #prompt

To correct this, if your file sizes are huge, strip the lines using ListA[1].rstrip() .

Else, you can use .read() and split on "\\n" , create a list, and use a custom list comprehension method:

with open("file A") as f1 ,open("file B")  as f2:
    s1 = f1.read().split("\n")
    s2 = f2.read().split("\n")
    with open("matching.txt","w") as match, open("non-matching.txt","w") as no_match:
        matching = [x for x in s2 for y in s1 if y in x]
        non_matching = [x for x in s2 for y in s1 if y not in x]
        for line in matching:
            match.write(line)
        for line in non_matching:
            no_match.write(line)

python check a list with another list

Question

2 answers

solution1
2 2015-02-13 15:08:01

solution2
1 2015-02-13 15:16:06

python check a list with another list

Question

2 answers

solution1 2 2015-02-13 15:08:01

solution2 1 2015-02-13 15:16:06

solution1
2 2015-02-13 15:08:01

solution2
1 2015-02-13 15:16:06