简体   繁体   中英

Not recognising hyphen on split

I'm working with about 24k text files and am splitting some lines on '-'. It works for some files, however it fails to split for some other files.

company_participants is a list with N >= 1 elements, with each element consisting of a name followed by a hyphen ("-"), followed by the job title. To get the names, I use:

names_participants = [name.split('-')[0].strip() for name in company_participants]

On closer inspection, I found that it does not recognise "-" as "-" for some reason.

For example, the first element in company_participants is "robert isom - president"

Calling company_participants[0].split()[2] returns "-" since I've split on whitespace, and the hyphen is the third element (index 2).

When I then run a boolean on whether this is equal to "-", I get False.

company_participants[0].split()[2] == "-"  # Item at index 2 is the hyphen
# Output = False

Any idea what's going on here? Is there something else that looks like a hyphen but isn't one?

Many thanks!

So I found that this has actually been answered elsewhere on StackOverflow.

Apparently I'm dealing with a "dash" and not a "hyphen"; couldn't see the difference with me naked eyes but when I copied the symbol from here , then it recognised it such that company_participants[0].split()[2] == "–" returned True.

#textDataProblems
#didNotSeeThatComing

Thank you StackOverflow!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM