简体   繁体   English

优雅的Python函数将CamelCase转换为snake_case?

[英]Elegant Python function to convert CamelCase to snake_case?

Example:例子:

>>> convert('CamelCase')
'camel_case'

Camel case to snake case骆驼套到蛇套

import re

name = 'CamelCaseName'
name = re.sub(r'(?<!^)(?=[A-Z])', '_', name).lower()
print(name)  # camel_case_name

If you do this many times and the above is slow, compile the regex beforehand:如果你多次这样做并且上面的很慢,请事先编译正则表达式:

pattern = re.compile(r'(?<!^)(?=[A-Z])')
name = pattern.sub('_', name).lower()

To handle more advanced cases specially (this is not reversible anymore):要特别处理更高级的情况(这不再可逆):

def camel_to_snake(name):
  name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
  return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()

print(camel_to_snake('camel2_camel2_case'))  # camel2_camel2_case
print(camel_to_snake('getHTTPResponseCode'))  # get_http_response_code
print(camel_to_snake('HTTPResponseCodeXYZ'))  # http_response_code_xyz

To add also cases with two underscores or more:添加带有两个或更多下划线的案例:

def to_snake_case(name):
    name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
    name = re.sub('__([A-Z])', r'_\1', name)
    name = re.sub('([a-z0-9])([A-Z])', r'\1_\2', name)
    return name.lower()

Snake case to camel case蛇案到骆驼案

name = 'snake_case_name'
name = ''.join(word.title() for word in name.split('_'))
print(name)  # SnakeCaseName

There's an inflection library in the package index that can handle these things for you.包索引中有一个屈折库可以为您处理这些事情。 In this case, you'd be looking for inflection.underscore() :在这种情况下,您将寻找inflection.underscore()

>>> inflection.underscore('CamelCase')
'camel_case'

I don't know why these are all so complicating.我不知道为什么这些都这么复杂。

for most cases, the simple expression ([AZ]+) will do the trick在大多数情况下,简单的表达式([AZ]+)就可以解决问题

>>> re.sub('([A-Z]+)', r'_\1','CamelCase').lower()
'_camel_case'  
>>> re.sub('([A-Z]+)', r'_\1','camelCase').lower()
'camel_case'
>>> re.sub('([A-Z]+)', r'_\1','camel2Case2').lower()
'camel2_case2'
>>> re.sub('([A-Z]+)', r'_\1','camelCamelCase').lower()
'camel_camel_case'
>>> re.sub('([A-Z]+)', r'_\1','getHTTPResponseCode').lower()
'get_httpresponse_code'

To ignore the first character simply add look behind (?!^)要忽略第一个字符,只需添加向后看(?!^)

>>> re.sub('(?!^)([A-Z]+)', r'_\1','CamelCase').lower()
'camel_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','CamelCamelCase').lower()
'camel_camel_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','Camel2Camel2Case').lower()
'camel2_camel2_case'
>>> re.sub('(?!^)([A-Z]+)', r'_\1','getHTTPResponseCode').lower()
'get_httpresponse_code'

If you want to separate ALLCaps to all_caps and expect numbers in your string you still don't need to do two separate runs just use |如果您想将 ALLCaps 与 all_caps 分开并期望字符串中有数字,您仍然不需要进行两次单独的运行,只需使用| This expression ((?<=[a-z0-9])[AZ]|(?!^)[AZ](?=[az])) can handle just about every scenario in the book这个表达式((?<=[a-z0-9])[AZ]|(?!^)[AZ](?=[az]))可以处理本书中的几乎所有场景

>>> a = re.compile('((?<=[a-z0-9])[A-Z]|(?!^)[A-Z](?=[a-z]))')
>>> a.sub(r'_\1', 'getHTTPResponseCode').lower()
'get_http_response_code'
>>> a.sub(r'_\1', 'get2HTTPResponseCode').lower()
'get2_http_response_code'
>>> a.sub(r'_\1', 'get2HTTPResponse123Code').lower()
'get2_http_response123_code'
>>> a.sub(r'_\1', 'HTTPResponseCode').lower()
'http_response_code'
>>> a.sub(r'_\1', 'HTTPResponseCodeXYZ').lower()
'http_response_code_xyz'

It all depends on what you want so use the solution that best suits your needs as it should not be overly complicated.这完全取决于您想要什么,因此请使用最适合您需求的解决方案,因为它不应过于复杂。

nJoy!快乐!

Avoiding libraries and regular expressions:避免库和正则表达式:

def camel_to_snake(s):
    return ''.join(['_'+c.lower() if c.isupper() else c for c in s]).lstrip('_')
>>> camel_to_snake('ThisIsMyString')
'this_is_my_string'

stringcase is my go-to library for this; stringcase是我的首选库; eg:例如:

>>> from stringcase import pascalcase, snakecase
>>> snakecase('FooBarBaz')
'foo_bar_baz'
>>> pascalcase('foo_bar_baz')
'FooBarBaz'

Personally I am not sure how anything using regular expressions in python can be described as elegant.就我个人而言,我不确定在 python 中使用正则表达式的任何东西都可以被描述为优雅。 Most answers here are just doing "code golf" type RE tricks.这里的大多数答案只是在做“代码高尔夫”类型的 RE 技巧。 Elegant coding is supposed to be easily understood.优雅的编码应该很容易理解。

def to_snake_case(not_snake_case):
    final = ''
    for i in xrange(len(not_snake_case)):
        item = not_snake_case[i]
        if i < len(not_snake_case) - 1:
            next_char_will_be_underscored = (
                not_snake_case[i+1] == "_" or
                not_snake_case[i+1] == " " or
                not_snake_case[i+1].isupper()
            )
        if (item == " " or item == "_") and next_char_will_be_underscored:
            continue
        elif (item == " " or item == "_"):
            final += "_"
        elif item.isupper():
            final += "_"+item.lower()
        else:
            final += item
    if final[0] == "_":
        final = final[1:]
    return final

>>> to_snake_case("RegularExpressionsAreFunky")
'regular_expressions_are_funky'

>>> to_snake_case("RegularExpressionsAre Funky")
'regular_expressions_are_funky'

>>> to_snake_case("RegularExpressionsAre_Funky")
'regular_expressions_are_funky'

I think this solution is more straightforward than previous answers:我认为这个解决方案比以前的答案更直接:

import re

def convert (camel_input):
    words = re.findall(r'[A-Z]?[a-z]+|[A-Z]{2,}(?=[A-Z][a-z]|\d|\W|$)|\d+', camel_input)
    return '_'.join(map(str.lower, words))


# Let's test it
test_strings = [
    'CamelCase',
    'camelCamelCase',
    'Camel2Camel2Case',
    'getHTTPResponseCode',
    'get200HTTPResponseCode',
    'getHTTP200ResponseCode',
    'HTTPResponseCode',
    'ResponseHTTP',
    'ResponseHTTP2',
    'Fun?!awesome',
    'Fun?!Awesome',
    '10CoolDudes',
    '20coolDudes'
]
for test_string in test_strings:
    print(convert(test_string))

Which outputs:哪些输出:

camel_case
camel_camel_case
camel_2_camel_2_case
get_http_response_code
get_200_http_response_code
get_http_200_response_code
http_response_code
response_http
response_http_2
fun_awesome
fun_awesome
10_cool_dudes
20_cool_dudes

The regular expression matches three patterns:正则表达式匹配三种模式:

  1. [AZ]?[az]+ : Consecutive lower-case letters that optionally start with an upper-case letter. [AZ]?[az]+ :连续的小写字母,可选择以大写字母开头。
  2. [AZ]{2,}(?=[AZ][az]|\\d|\\W|$) : Two or more consecutive upper-case letters. [AZ]{2,}(?=[AZ][az]|\\d|\\W|$) :两个或多个连续的大写字母。 It uses a lookahead to exclude the last upper-case letter if it is followed by a lower-case letter.如果最后一个大写字母后跟一个小写字母,它会使用前瞻来排除最后一个大写字母。
  3. \\d+ : Consecutive numbers. \\d+ :连续数字。

By using re.findall we get a list of individual "words" that can be converted to lower-case and joined with underscores.通过使用re.findall我们可以得到一个单独的“单词”列表,这些单词可以转换为小写并用下划线连接。

''.join('_'+c.lower() if c.isupper() else c for c in "DeathToCamelCase").strip('_')
re.sub("(.)([A-Z])", r'\1_\2', 'DeathToCamelCase').lower()

I don't get idea why using both .sub() calls?我不明白为什么同时使用 .sub() 调用? :) I'm not regex guru, but I simplified function to this one, which is suitable for my certain needs, I just needed a solution to convert camelCasedVars from POST request to vars_with_underscore: :) 我不是正则表达式大师,但我将函数简化为这个函数,它适合我的某些需求,我只需要一个解决方案,将 POST 请求中的 camelCasedVars 转换为 vars_with_underscore:

def myFunc(...):
  return re.sub('(.)([A-Z]{1})', r'\1_\2', "iTriedToWriteNicely").lower()

It does not work with such names like getHTTPResponse, cause I heard it is bad naming convention (should be like getHttpResponse, it's obviously, that it's much easier memorize this form).它不适用于像 getHTTPResponse 这样的名称,因为我听说它是​​糟糕的命名约定(应该像 getHttpResponse,显然,记住这种形式要容易得多)。

Here's my solution:这是我的解决方案:

def un_camel(text):
    """ Converts a CamelCase name into an under_score name. 

        >>> un_camel('CamelCase')
        'camel_case'
        >>> un_camel('getHTTPResponseCode')
        'get_http_response_code'
    """
    result = []
    pos = 0
    while pos < len(text):
        if text[pos].isupper():
            if pos-1 > 0 and text[pos-1].islower() or pos-1 > 0 and \
            pos+1 < len(text) and text[pos+1].islower():
                result.append("_%s" % text[pos].lower())
            else:
                result.append(text[pos].lower())
        else:
            result.append(text[pos])
        pos += 1
    return "".join(result)

It supports those corner cases discussed in the comments.它支持评论中讨论的那些极端情况。 For instance, it'll convert getHTTPResponseCode to get_http_response_code like it should.例如,它会像它应该的那样将getHTTPResponseCode转换为get_http_response_code

For the fun of it:为了它的乐趣:

>>> def un_camel(input):
...     output = [input[0].lower()]
...     for c in input[1:]:
...             if c in ('ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
...                     output.append('_')
...                     output.append(c.lower())
...             else:
...                     output.append(c)
...     return str.join('', output)
...
>>> un_camel("camel_case")
'camel_case'
>>> un_camel("CamelCase")
'camel_case'

Or, more for the fun of it:或者,更多的是为了它的乐趣:

>>> un_camel = lambda i: i[0].lower() + str.join('', ("_" + c.lower() if c in "ABCDEFGHIJKLMNOPQRSTUVWXYZ" else c for c in i[1:]))
>>> un_camel("camel_case")
'camel_case'
>>> un_camel("CamelCase")
'camel_case'

Using regexes may be the shortest, but this solution is way more readable:使用正则表达式可能是最短的,但这个解决方案更具可读性:

def to_snake_case(s):
    snake = "".join(["_"+c.lower() if c.isupper() else c for c in s])
    return snake[1:] if snake.startswith("_") else snake

This is not a elegant method, is a very 'low level' implementation of a simple state machine (bitfield state machine), possibly the most anti pythonic mode to resolve this, however re module also implements a too complex state machine to resolve this simple task, so i think this is a good solution.这不是一个优雅的方法,是一个简单状态机(位域状态机)的非常“低级”的实现,可能是解决这个问题的最反 Pythonic 模式,但是 re 模块也实现了一个过于复杂的状态机来解决这个简单的问题任务,所以我认为这是一个很好的解决方案。

def splitSymbol(s):
    si, ci, state = 0, 0, 0 # start_index, current_index 
    '''
        state bits:
        0: no yields
        1: lower yields
        2: lower yields - 1
        4: upper yields
        8: digit yields
        16: other yields
        32 : upper sequence mark
    '''
    for c in s:

        if c.islower():
            if state & 1:
                yield s[si:ci]
                si = ci
            elif state & 2:
                yield s[si:ci - 1]
                si = ci - 1
            state = 4 | 8 | 16
            ci += 1

        elif c.isupper():
            if state & 4:
                yield s[si:ci]
                si = ci
            if state & 32:
                state = 2 | 8 | 16 | 32
            else:
                state = 8 | 16 | 32

            ci += 1

        elif c.isdigit():
            if state & 8:
                yield s[si:ci]
                si = ci
            state = 1 | 4 | 16
            ci += 1

        else:
            if state & 16:
                yield s[si:ci]
            state = 0
            ci += 1  # eat ci
            si = ci   
        print(' : ', c, bin(state))
    if state:
        yield s[si:ci] 


def camelcaseToUnderscore(s):
    return '_'.join(splitSymbol(s)) 

splitsymbol can parses all case types: UpperSEQUENCEInterleaved, under_score, BIG_SYMBOLS and cammelCasedMethods splitsymbol 可以解析所有 case 类型:UpperSEQUENCEInterleaved、under_score、BIG_SYMBOLS 和 cammelCasedMethods

I hope it is useful我希望它有用

So many complicated methods... Just find all "Titled" group and join its lower cased variant with underscore.这么多复杂的方法......只需找到所有“标题”组并用下划线加入其小写变体。

>>> import re
>>> def camel_to_snake(string):
...     groups = re.findall('([A-z0-9][a-z]*)', string)
...     return '_'.join([i.lower() for i in groups])
...
>>> camel_to_snake('ABCPingPongByTheWay2KWhereIsOurBorderlands3???')
'a_b_c_ping_pong_by_the_way_2_k_where_is_our_borderlands_3'

If you don't want make numbers like first character of group or separate group - you can use ([Az][a-z0-9]*) mask.如果你不想让数字像组的第一个字符或单独的组一样 - 你可以使用([Az][a-z0-9]*)掩码。

This simple method should do the job:这个简单的方法应该可以完成这项工作:

import re

def convert(name):
    return re.sub(r'([A-Z]*)([A-Z][a-z]+)', lambda x: (x.group(1) + '_' if x.group(1) else '') + x.group(2) + '_', name).rstrip('_').lower()
  • We look for capital letters that are precedeed by any number of (or zero) capital letters, and followed by any number of lowercase characters.我们寻找前面有任意数量(或零)大写字母,后面跟着任意数量小写字符的大写字母。
  • An underscore is placed just before the occurence of the last capital letter found in the group, and one can be placed before that capital letter in case it is preceded by other capital letters.下划线放在组中最后一个大写字母出现之前,如果前面有其他大写字母,则可以放在该大写字母之前。
  • If there are trailing underscores, remove those.如果有尾随下划线,请删除它们。
  • Finally, the whole result string is changed to lower case.最后,将整个结果字符串更改为小写。

(taken from here , see working example online ) (取自此处,请参阅在线工作示例

Lightely adapted from https://stackoverflow.com/users/267781/matth who use generators.轻松改编自使用生成器的https://stackoverflow.com/users/267781/matth

def uncamelize(s):
    buff, l = '', []
    for ltr in s:
        if ltr.isupper():
            if buff:
                l.append(buff)
                buff = ''
        buff += ltr
    l.append(buff)
    return '_'.join(l).lower()

Take a look at the excellent Schematics lib看看优秀的 Schematics 库

https://github.com/schematics/schematics https://github.com/schematics/schematics

It allows you to created typed data structures that can serialize/deserialize from python to Javascript flavour, eg:它允许您创建可以从 python 序列化/反序列化到 Javascript 风格的类型化数据结构,例如:

class MapPrice(Model):
    price_before_vat = DecimalType(serialized_name='priceBeforeVat')
    vat_rate = DecimalType(serialized_name='vatRate')
    vat = DecimalType()
    total_price = DecimalType(serialized_name='totalPrice')

不在标准库中,但我发现这个模块似乎包含您需要的功能。

A horrendous example using regular expressions (you could easily clean this up :) ):使用正则表达式的可怕示例(您可以轻松清理它:)):

def f(s):
    return s.group(1).lower() + "_" + s.group(2).lower()

p = re.compile("([A-Z]+[a-z]+)([A-Z]?)")
print p.sub(f, "CamelCase")
print p.sub(f, "getHTTPResponseCode")

Works for getHTTPResponseCode though!虽然适用于 getHTTPResponseCode!

Alternatively, using lambda:或者,使用 lambda:

p = re.compile("([A-Z]+[a-z]+)([A-Z]?)")
print p.sub(lambda x: x.group(1).lower() + "_" + x.group(2).lower(), "CamelCase")
print p.sub(lambda x: x.group(1).lower() + "_" + x.group(2).lower(), "getHTTPResponseCode")

EDIT: It should also be pretty easy to see that there's room for improvement for cases like "Test", because the underscore is unconditionally inserted.编辑:也应该很容易看出“测试”之类的情况还有改进的余地,因为下划线是无条件插入的。

I was looking for a solution to the same problem, except that I needed a chain;我正在寻找相同问题的解决方案,只是我需要一个链条; eg例如

"CamelCamelCamelCase" -> "Camel-camel-camel-case"

Starting from the nice two-word solutions here, I came up with the following:从这里很好的两字解决方案开始,我想出了以下内容:

"-".join(x.group(1).lower() if x.group(2) is None else x.group(1) \
         for x in re.finditer("((^.[^A-Z]+)|([A-Z][^A-Z]+))", "stringToSplit"))

Most of the complicated logic is to avoid lowercasing the first word.大多数复杂的逻辑是为了避免小写第一个单词。 Here's a simpler version if you don't mind altering the first word:如果您不介意更改第一个单词,这里有一个更简单的版本:

"-".join(x.group(1).lower() for x in re.finditer("(^[^A-Z]+|[A-Z][^A-Z]+)", "stringToSplit"))

Of course, you can pre-compile the regular expressions or join with underscore instead of hyphen, as discussed in the other solutions.当然,您可以预编译正则表达式或使用下划线代替连字符连接,如其他解决方案中所述。

Concise without regular expressions, but HTTPResponseCode=> httpresponse_code:简洁,没有正则表达式,但是 HTTPResponseCode=> httpresponse_code:

def from_camel(name):
    """
    ThisIsCamelCase ==> this_is_camel_case
    """
    name = name.replace("_", "")
    _cas = lambda _x : [_i.isupper() for _i in _x]
    seq = zip(_cas(name[1:-1]), _cas(name[2:]))
    ss = [_x + 1 for _x, (_i, _j) in enumerate(seq) if (_i, _j) == (False, True)]
    return "".join([ch + "_" if _x in ss else ch for _x, ch in numerate(name.lower())])

Without any library :没有任何图书馆:

def camelify(out):
    return (''.join(["_"+x.lower() if i<len(out)-1 and x.isupper() and out[i+1].islower()
         else x.lower()+"_" if i<len(out)-1 and x.islower() and out[i+1].isupper()
         else x.lower() for i,x in enumerate(list(out))])).lstrip('_').replace('__','_')

A bit heavy, but有点重,但是

CamelCamelCamelCase ->  camel_camel_camel_case
HTTPRequest         ->  http_request
GetHTTPRequest      ->  get_http_request
getHTTPRequest      ->  get_http_request

Just in case someone needs to transform a complete source file, here is a script that will do it.以防万一有人需要转换完整的源文件,这里有一个脚本可以做到这一点。

# Copy and paste your camel case code in the string below
camelCaseCode ="""
    cv2.Matx33d ComputeZoomMatrix(const cv2.Point2d & zoomCenter, double zoomRatio)
    {
      auto mat = cv2.Matx33d::eye();
      mat(0, 0) = zoomRatio;
      mat(1, 1) = zoomRatio;
      mat(0, 2) = zoomCenter.x * (1. - zoomRatio);
      mat(1, 2) = zoomCenter.y * (1. - zoomRatio);
      return mat;
    }
"""

import re
def snake_case(name):
    s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
    return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()

def lines(str):
    return str.split("\n")

def unlines(lst):
    return "\n".join(lst)

def words(str):
    return str.split(" ")

def unwords(lst):
    return " ".join(lst)

def map_partial(function):
    return lambda values : [  function(v) for v in values]

import functools
def compose(*functions):
    return functools.reduce(lambda f, g: lambda x: f(g(x)), functions, lambda x: x)

snake_case_code = compose(
    unlines ,
    map_partial(unwords),
    map_partial(map_partial(snake_case)),
    map_partial(words),
    lines
)
print(snake_case_code(camelCaseCode))

Wow I just stole this from django snippets.哇,我只是从 django 片段中偷走了这个。 ref http://djangosnippets.org/snippets/585/参考http://djangosnippets.org/snippets/585/

Pretty elegant相当优雅

camelcase_to_underscore = lambda str: re.sub(r'(?<=[a-z])[A-Z]|[A-Z](?=[^A-Z])', r'_\g<0>', str).lower().strip('_')

Example:例子:

camelcase_to_underscore('ThisUser')

Returns:返回:

'this_user'

REGEX DEMO正则表达式演示

Very nice RegEx proposed on this site : 本网站上提出的非常好的正则表达式:

(?<!^)(?=[A-Z])

If python have a String Split method, it should work...如果python有一个String Split方法,它应该可以工作......

In Java:在 Java 中:

String s = "loremIpsum";
words = s.split("(?&#60;!^)(?=[A-Z])");

Here's something I did to change the headers on a tab-delimited file.这是我为更改制表符分隔文件的标题所做的一些事情。 I'm omitting the part where I only edited the first line of the file.我省略了我只编辑文件第一行的部分。 You could adapt it to Python pretty easily with the re library.您可以使用 re 库轻松地将其调整为 Python。 This also includes separating out numbers (but keeps the digits together).这还包括分离数字(但将数字保持在一起)。 I did it in two steps because that was easier than telling it not to put an underscore at the start of a line or tab.我分两步完成,因为这比告诉它不要在行或制表符的开头放置下划线更容易。

Step One...find uppercase letters or integers preceded by lowercase letters, and precede them with an underscore:第一步...找到前面有小写字母的大写字母或整数,并在它们前面加上下划线:

Search:搜索:

([a-z]+)([A-Z]|[0-9]+)

Replacement:替代品:

\1_\l\2/

Step Two...take the above and run it again to convert all caps to lowercase:第二步......采用上述并再次运行以将所有大写字母转换为小写字母:

Search:搜索:

([A-Z])

Replacement (that's backslash, lowercase L, backslash, one):替换(即反斜杠,小写 L,反斜杠,一):

\l\1

I have had pretty good luck with this one:我在这方面很幸运:

import re
def camelcase_to_underscore(s):
    return re.sub(r'(^|[a-z])([A-Z])',
                  lambda m: '_'.join([i.lower() for i in m.groups() if i]),
                  s)

This could obviously be optimized for speed a tiny bit if you want to.这可以明显的速度,如果你想进行优化一点点

import re

CC2US_RE = re.compile(r'(^|[a-z])([A-Z])')

def _replace(match):
    return '_'.join([i.lower() for i in match.groups() if i])

def camelcase_to_underscores(s):
    return CC2US_RE.sub(_replace, s)
def convert(name):
    return reduce(
        lambda x, y: x + ('_' if y.isupper() else '') + y, 
        name
    ).lower()

And if we need to cover a case with already-un-cameled input:如果我们需要覆盖一个已经没有驼峰输入的案例:

def convert(name):
    return reduce(
        lambda x, y: x + ('_' if y.isupper() and not x.endswith('_') else '') + y, 
        name
    ).lower()
def convert(camel_str):
    temp_list = []
    for letter in camel_str:
        if letter.islower():
            temp_list.append(letter)
        else:
            temp_list.append('_')
            temp_list.append(letter)
    result = "".join(temp_list)
    return result.lower()

Use: str.capitalize() to convert first letter of the string (contained in variable str) to a capital letter and returns the entire string.使用: str.capitalize()将字符串的第一个字母(包含在变量 str 中)转换为大写字母并返回整个字符串。

Example: Command: "hello".capitalize() Output: Hello示例:命令:“hello”.capitalize() 输出:Hello

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM