简体   繁体   中英

python string strip not working for the trailing double quote

I'm trying to do a task automatically using python script but met with this strange phenomenon. I looked for the same in SO but it's slightly different, so I ask here using simplified example.
I have a file called test1.txt below.

"https://papers.nips.cc/paper/7286-efficient-algorithms-for-non-convex-isotonic-regression-through-submodular-optimization" ## Efficient Algorithms for Non-convex Isotonic Regression through Submodular Optimization
"https://papers.nips.cc/paper/7287-structure-aware-convolutional-neural-networks" ## Structure-Aware Convolutional Neural Networks
"https://papers.nips.cc/paper/7288-kalman-normalization-normalizing-internal-representations-across-network-layers" ## Kalman Normalization: Normalizing Internal Representations Across Network Layers
"https://papers.nips.cc/paper/7289-hogwild-gibbs-can-be-panaccurate" ## HOGWILD!-Gibbs can be PanAccurate

and the python script quest.py

import re

with open('test1.txt') as f:
    for line in f:
        #print line
        link = re.sub(" ##.*","",line)
        print link
        link1 = link.strip('\"')
        print link1

When I execute it by python quest.py , I get

"https://papers.nips.cc/paper/7286-efficient-algorithms-for-non-convex-isotonic-regression-through-submodular-optimization"

https://papers.nips.cc/paper/7286-efficient-algorithms-for-non-convex-isotonic-regression-through-submodular-optimization"

"https://papers.nips.cc/paper/7287-structure-aware-convolutional-neural-networks"

https://papers.nips.cc/paper/7287-structure-aware-convolutional-neural-networks"

"https://papers.nips.cc/paper/7288-kalman-normalization-normalizing-internal-representations-across-network-layers"

https://papers.nips.cc/paper/7288-kalman-normalization-normalizing-internal-representations-across-network-layers"

"https://papers.nips.cc/paper/7289-hogwild-gibbs-can-be-panaccurate"

https://papers.nips.cc/paper/7289-hogwild-gibbs-can-be-panaccurate"

I want to print the link first time with the surrounding double quotes(=link) and then without the double quotes(=link1). But why do I see the trailing double quote for the link1?

Python's str.strip([chars]) will remove leading and trailing chars , but will stop once it reaches a character not in chars .

Looks like your link ends with a newline char, and stripping stops before even reaching the double quote. (Hint: print adds only one newline, and in your output you have two.)

To strip double quotes and newline chars:

link1 = link.strip('"\n')

Also, it's worth mentioning (as @glibdud notes in comments), the reason links were ending with a newline was because file iterator doesn't strip newlines, neither does the sub expression (because . doesn't include the newline; to include it, add re.DOTALL regex flag).

Just strip the double quotes and newline when you want to print without quotes, and only strip newlines, when you want to print with double quotes

import re

with open('file.txt') as f:
    for line in f:
        if line.strip():
            #print line
            link = re.sub(" ##.*", "", line)
            #Print with double quotes
            print link.strip('\n')
            #Print without double quotes by replacing double quotes with empty char
            print link.strip('"\n')

            #Print without double quotes by removing double quotes entirely
            #print link.strip("\"")

The output will then be

"https://papers.nips.cc/paper/7286-efficient-algorithms-for-non-convex-isotonic-regression-through-submodular-optimization"
https://papers.nips.cc/paper/7286-efficient-algorithms-for-non-convex-isotonic-regression-through-submodular-optimization
"https://papers.nips.cc/paper/7287-structure-aware-convolutional-neural-networks"
https://papers.nips.cc/paper/7287-structure-aware-convolutional-neural-networks
"https://papers.nips.cc/paper/7288-kalman-normalization-normalizing-internal-representations-across-network-layers"
https://papers.nips.cc/paper/7288-kalman-normalization-normalizing-internal-representations-across-network-layers
"https://papers.nips.cc/paper/7289-hogwild-gibbs-can-be-panaccurate"
https://papers.nips.cc/paper/7289-hogwild-gibbs-can-be-panaccurate

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM