简体   繁体   English

修剪或删除定界文本中的单个空格字符

[英]Trim or remove single white space char from delimited text

I have an issue where I am getting a # delimited file, however, in between the hashes, there is a single white space character signifying an empty value. 我遇到了一个#分隔文件的问题,但是,在散列之间,有一个空格字符表示一个空值。 This is causing me problems when I try to query the file later on. 稍后尝试查询文件时,这会导致我出现问题。 Is there a way I can remove all instances in a delimited line where just a single white space char exists? 有没有一种方法可以删除仅存在一个空格字符的定界行中的所有实例?

Here's a sample line from my file. 这是我文件中的示例行。

40001#World Music#Mike Oldfield#Tubular Bells#   53# # #

I would want the string to be... 我希望字符串是...

40001#World Music#Mike Oldfield#Tubular Bells#   53###

Using ternary conditionals and list comprehensions, you can do: 使用三元条件和列表推导,您可以执行以下操作:

s = "40001#World Music#Mike Oldfield#Tubular Bells#   53# # #"
s2 = "#".join([i if i != " " else "" for i in s.split("#")])
print s2

prints 版画

40001#World Music#Mike Oldfield#Tubular Bells#   53###

No need for imports (eg RE) 无需进口(例如RE)

Use regular expressions . 使用正则表达式

import re

my_str = "40001#World Music#Mike Oldfield#Tubular Bells# 53# # #"
pattern = re.compile(r'(#)\s(#)\s(#)')

new_str = re.sub(pattern, r'\1\2\3', my_str)

print(new_str)

Use re.sub function. 使用re.sub函数。

re.sub(r'(?<=#) (?=#)', r'', string)

OR 要么

re.sub(r'(?<=#)\s(?=#)', r'', string)

Example: 例:

>>> s = "40001#World Music#Mike Oldfield#Tubular Bells#   53# # #"
>>> re.sub(r'(?<=#) (?=#)', r'', s)
'40001#World Music#Mike Oldfield#Tubular Bells#   53###'
  • (?<=#) Positive lookbehind asserts that the match must be preceded by a # (?<=#)正回顾后断言匹配必须由前面#
  • \\s Matches a space character. \\s匹配一个空格字符。
  • (?=#) Positive lookahead which asserts that the match must be followed by a # character. (?=#)正向超前,断言匹配必须后跟#字符。

This regex seems to do what you want, using a positive lookahead: http://regexr.com/3abqs 这个正则表达式似乎可以满足您的要求,使用积极的前瞻: http : //regexr.com/3abqs

import re
str = "40001#World Music#Mike Oldfield#Tubular Bells#   53# # #"
strf = re.sub(r'#\s+(?=#)', r'#', str)
print(strf)

You can use something like:- 您可以使用类似:

orig_str="40001#World Music#Mike Oldfield#Tubular Bells#   53# # #"
splitted_str = orig_str.split("#")[:-1]
new_str = ''
for item in splitted_str:
  if item.strip():
    new_str+=item
  new_str+="#"
print new_str  

This would print 40001#World Music#Mike Oldfield#Tubular Bells# 53### 这将打印40001#World Music#Mike Oldfield#Tubular Bells# 53###

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM