简体   繁体   English

Python:将lambda与startswith一起使用

[英]Python: using lambda with startswith

I need to writing my dataframe to csv, and some of the series start with "+-= ", so I need to remove them first. 我需要将数据帧写入csv,并且某些系列以“ +-=”开头,因此我需要先将其删除。

I tried to test by using a string: 我尝试使用字符串进行测试:

test="+++++-= I love Mercedes-Benz"
while True:
    if test.startswith('+') or test.startswith('-') or test.startswith('=') or test.startswith(' '):
        test=test[1:]
        continue

    else:
        print(test)
        break

Output looks perfect: 输出看起来很完美:

I love Mercedes-Benz.

Now when I want to do the same while using lambda in my dataframe: 现在,当我想在数据帧中使用lambda时做同样的事情时:

import pandas as pd

col_names =  ['A', 'B', 'C']
my_df  = pd.DataFrame(columns = col_names)
my_df.loc[len(my_df)] = ["++++-= I love Mercedes-Benz", 4, "Love this"]
my_df.loc[len(my_df)] = ["=Looks so good!", 2, "5-year-old"]
my_df

my_df["A"]=my_df["A"].map(lambda x: x[1:] if x.startswith('=') else x)
print(my_df["A"])

I am not sure how to put 4 startswith "-","=","+"," " together and loop them until they meet the first alphabet or character(sometimes it might be in Japanese or Chinese.) 我不确定如何将4个以“-”,“ =”,“ +”,“”开头的字符放在一起,然后循环直到它们遇到第一个字母或字符(有时可能是日语或中文)。

expected final my_df: 预期的最终my_df:

         A                    B          C
0   I love Mercedes-Benz      4       Love this
1   Looks so good!            2       5-year-old

You can use str.lstrip in order to remove these leading characters: 您可以使用str.lstrip来删除这些前导字符:

my_df.A.str.lstrip('+-=')

0     I love Mercedes-Benz
1           Looks so good!
Name: A, dtype: object

The function startswith accepts a tuple of prefixes: 函数startswith接受一个前缀的元组:

while test.startswith(('+','-','=',' ')):
    test=test[1:]

But you can't put that in a lambda. 但是您不能将其放在lambda中。 But then, you don't need a lambda: just write the function and pass its name to map . 但是然后,您不需要lambda:只需编写函数并将其名称传递给map

One way to achieve it could be 实现它的一种方法可能是

old = ""
while old != my_df["A"]:
    old = my_df["A"]
    my_df["A"]=my_df["A"].map(lambda x: x[1:] if any(x.startswith(char) for char in "-=+ ") else x)

But I'd like to warn you about the strip() method for strings: 但我想警告您关于字符串的strip()方法:

>>> test="+++++-= I love Mercedes-Benz"
>>> test.strip("+-=")
' I love Mercedes-Benz'

So your data extraction can become simpler: 因此,您的数据提取可以变得更加简单:

my_df["A"].str=my_df["A"].str.strip("+=- ")

Just be careful because strip will remove the characters from both sides of the string. 请小心,因为strip将从字符串的两侧删除字符。 lstrip instead can do the job only on the left side. lstrip只能在左侧执行此工作。

As a lover of regex and possibly convoluted solutions, I will add this solution as well: 作为正则表达式和可能复杂的解决方案的爱好者 ,我还将添加以下解决方案:

import re

my_df["A"]=my_df["A"].map(lambda x: re.sub('^[*-=\s]*', '', x))

the regex reads: 正则表达式为:
^ from the beginning ^从一开始
[] items in this group 此群组中的[]项目
\\s any whitespace \\s任何空格
* zero or more *零或更多
so this will match (and replace with nothing) all the characters from the beginning of the string that are in the square brackets 因此这将匹配(并且不替换任何内容)字符串开头的所有方括号中的字符

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM