简体   繁体   English

拆分时无法识别连字符

[英]Not recognising hyphen on split

I'm working with about 24k text files and am splitting some lines on '-'.我正在处理大约 24k 的文本文件,并在“-”上分割了一些行。 It works for some files, however it fails to split for some other files.它适用于某些文件,但无法拆分某些其他文件。

company_participants is a list with N >= 1 elements, with each element consisting of a name followed by a hyphen ("-"), followed by the job title. company_participants是一个包含N >= 1元素的列表,每个元素由一个名称后跟一个连字符(“-”)和职位名称组成。 To get the names, I use:要获取名称,我使用:

names_participants = [name.split('-')[0].strip() for name in company_participants]

On closer inspection, I found that it does not recognise "-" as "-" for some reason.经过仔细检查,我发现它由于某种原因无法将“-”识别为“-”。

For example, the first element in company_participants is "robert isom - president"例如, company_participants中的第一个元素是“robert isom - 总统”

Calling company_participants[0].split()[2] returns "-" since I've split on whitespace, and the hyphen is the third element (index 2).调用company_participants[0].split()[2]返回“-”,因为我在空格上进行了拆分,连字符是第三个元素(索引 2)。

When I then run a boolean on whether this is equal to "-", I get False.然后当我运行 boolean 是否等于“-”时,我得到 False。

company_participants[0].split()[2] == "-"  # Item at index 2 is the hyphen
# Output = False

Any idea what's going on here?知道这里发生了什么吗? Is there something else that looks like a hyphen but isn't one?还有其他看起来像连字符但不是连字符的东西吗?

Many thanks!非常感谢!

So I found that this has actually been answered elsewhere on StackOverflow.所以我发现这实际上已经在 StackOverflow 的其他地方得到了回答。

Apparently I'm dealing with a "dash" and not a "hyphen";显然我正在处理“破折号”而不是“连字符”; couldn't see the difference with me naked eyes but when I copied the symbol from here , then it recognised it such that company_participants[0].split()[2] == "–" returned True.肉眼看不到区别,但是当我从这里复制符号时,它会识别出它,因此company_participants[0].split()[2] == "–"返回 True。

#textDataProblems #textDataProblems
#didNotSeeThatComing #didNotSeeThatComing

Thank you StackOverflow!谢谢 StackOverflow!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM