![](/img/trans.png)
[英]With pyparsing, how do you parse a quoted string that ends with a backslash
[英]How do you split a string into quoted sentence and numbers using python
嗨,大家好,我是python新手,不胜感激!
我有多个这样的字符串:
21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38
而且我正在尝试找出如何根据一组单词(即"Mckenzie Meadows Golf Course"
)在双引号和双引号(不带引号)之间进行分界。
然后,我将字符串重新排列为以下格式:
"Mckenzie Meadows Golf Course" 21357.53 84898.10 80912.48 84102.38
重新排列我只会用
for row in data:
outfile.write('{0} {1} {2} {3} {4}'.format(row[2], row[0], row[1], row[3], row[4]))
outfile.write('\n')
但是我不确定如何将单引号的句子排除在外。 谢谢您的帮助!
您可以尝试以下方法:
s = "21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38"
sList = s.split(' ')
words = []
nums = []
for l in sList:
if l.isalpha():
words.append(l)
elif l.isdigit():
nums.append(l)
wordString = "\"%s\"" % " ".join(words)
row = [wordString] + nums
此时, row
包含所需的行。
这就是我这样做的方式:
import re
tgt='21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38'
nums=[m.group() for m in re.finditer(r'[\d\.]+',tgt)]
words=[m.group() for m in re.finditer(r'[a-zA-Z]+',tgt)]
print '"{}" {}'.format(' '.join(words),' '.join(nums))
印刷品:
"Mckenzie Meadows Golf Course" 21357.53 84898.10 80912.48 84102.38
另外,您可以测试Python认为是浮点数的东西来找到它们:
nums=[]
words=[]
for e in tgt.split():
try:
nums.append(float(e))
except ValueError:
words.append(e)
print words,nums
最后,如果您具有4个浮点数和一个字符串(float,float,string,float,float)的固定格式,则可以执行以下操作:
li=tgt.split()
nums=' '.join(li[0:2]+li[-2:])
words=' '.join(li[2:-2])
print words,nums
使用正则表达式的代码:
import re
s = '21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38'
row = re.search('([0-9.]+)\s([0-9.]+)\s([\w ]+)\s([0-9.]+)\s([0-9.]+)', s)
if row:
print '"{0}" {1} {2} {3} {4}'.format(row.group(3), row.group(1), row.group(2), row.group(4), row.group(5))
将打印(带双引号):
"Mckenzie Meadows Golf Course" 21357.53 84898.10 80912.48 84102.38
使用str
方法:
>>> s = '21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38'
>>> temp = s.split()
>>> temp
['21357.53', '84898.10', 'Mckenzie', 'Meadows', 'Golf', 'Course', '80912.48', '84102.38']
>>> row = [temp[0], temp[1], '"'+' '.join(temp[2:-2])+'"', temp[-2], temp[-1]]
>>> row
['21357.53', '84898.10', '"Mckenzie Meadows Golf Course"', '80912.48', '84102.38']
>>> print '{0} {1} {2} {3} {4}'.format(row[2], row[0], row[1], row[3], row[4])
"Mckenzie Meadows Golf Course" 21357.53 84898.10 80912.48 84102.38
>>> words = "21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38".split()
>>> print '"%s" %s'%(" ".join(filter(lambda x: x.isalpha(), words)), " ".join(filter(lambda x: not x.isalpha(), words)))
"Mckenzie Meadows Golf Course" 21357.53 84898.10 80912.48 84102.38
更严格地说,在不假设所有非字母单词都是浮点数的情况下(使用reduce
):
>>> words = "21357.53 84898.10 Mckenzie Meadows Golf Course 80912.48 84102.38".split()
>>> print '"%s" %s'%(" ".join(filter(lambda x: x.isalpha(), words)), " ".join(filter(lambda x: reduce(lambda y, z: z.isdigit() and z, x.split('.'), True), words)))
"Mckenzie Meadows Golf Course" 21357.53 84898.10 80912.48 84102.38
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.