简体   繁体   English

用大写字母拆分字符串

[英]Splitting a string by capital letters

I currently have the following code, which finds capital letters in a string 'formula': http://pastebin.com/syRQnqCP 我目前有以下代码,它在字符串'formula'中找到大写字母: http//pastebin.com/syRQnqCP

Now, my question is, how can I alter that code (Disregard the bit within the "if choice = 1:" loop) so that each part of that newly broken up string is put into it's own variable? 现在,我的问题是,如何更改代码(忽略“if choice = 1:”循环中的位),以便将新分解的字符串的每个部分放入其自己的变量中?

For example, putting in NaBr would result in the string being broken into "Na" and "Br". 例如,放入NaBr会导致字符串被分解为“Na”和“Br”。 I need to put those in separate variables so I can look them up in my CSV file. 我需要将它们放在单独的变量中,以便我可以在我的CSV文件中查找它们。 Preferably it'd be a kind of generated thing, so if there are 3 elements, like MgSO4, O would be put into a separate variable like Mg and S would be. 优选地,它是一种生成的物质,因此如果有3种元素,如MgSO4,O将被放入一个单独的变量,如Mg和S。

If this is unclear, let me know and I'll try and make it a bit more comprehensible... No way of doing so comes to mind currently, though. 如果不清楚,请告诉我,我会尝试让它更容易理解......但是目前还没有办法解决这个问题。 :( :(

EDIT: Relevant pieces of code: 编辑:相关的代码片段:

Function: 功能:

def split_uppercase(string):
x=''
for i in string: 
    if i.isupper(): x+=' %s' %i 
    else: x+=i 
return x.strip()

String entry and lookup: 字符串输入和查找:

formula = raw_input("Enter formula: ")
upper = split_uppercase(formula)

#Pull in data from form.csv
weight1 = float(formul_data.get(element1.lower()))
weight2 = float(formul_data.get(element2.lower()))
weight3 = float(formul_data.get(element3.lower()))


weightSum = weight1 + weight2 + weight3
print "Total weight =", weightSum

I think there is a far easier way to do what you're trying to do. 我认为有一种更容易的方法可以做你想做的事情。 Use regular expressions. 使用正则表达式。 For instance: 例如:

>>> [a for a in re.split(r'([A-Z][a-z]*)', 'MgSO4') if a]
['Mg', u'S', u'O', u'4']

If you want the number attached to the right element, just add a digit specifier in the regex: 如果你想要附加到右边元素的数字,只需在正则表达式中添加一个数字说明符:

>>> [a for a in re.split(r'([A-Z][a-z]*\d*)', txt) if a]
[u'Mg', u'S', u'O4']

You don't really want to "put each part in its own variable". 你真的不想“把每个部分放在自己的变量中”。 That doesn't make sense in general, because you don't know how many parts there are, so you can't know how many variables to create ahead of time. 这一般没有意义,因为您不知道有多少部分,所以您无法知道要提前创建多少变量。 Instead, you want to make a list, like in the example above. 相反,您想要制作一个列表,就像上面的例子一样。 Then you can iterate over this list and do what you need to do with each piece. 然后你可以遍历这个列表并做你需要做的每一件事。

You can use re.split to perform complex splitting on strings. 您可以使用re.split对字符串执行复杂拆分。

import re

def split_upper(s):
    return filter(None, re.split("([A-Z][^A-Z]*)", s))

>>> split_upper("fooBarBaz")
['foo', 'Bar', 'Baz']
>>> split_upper("fooBarBazBB")
['foo', 'Bar', 'Baz', 'B', 'B']
>>> split_upper("fooBarBazBB4")
['foo', 'Bar', 'Baz', 'B', 'B4']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM