[英]Python: using lambda with startswith
I need to writing my dataframe to csv, and some of the series start with "+-= ", so I need to remove them first. 我需要将数据帧写入csv,并且某些系列以“ +-=”开头,因此我需要先将其删除。
I tried to test by using a string: 我尝试使用字符串进行测试:
test="+++++-= I love Mercedes-Benz"
while True:
if test.startswith('+') or test.startswith('-') or test.startswith('=') or test.startswith(' '):
test=test[1:]
continue
else:
print(test)
break
Output looks perfect: 输出看起来很完美:
I love Mercedes-Benz.
Now when I want to do the same while using lambda in my dataframe: 现在,当我想在数据帧中使用lambda时做同样的事情时:
import pandas as pd
col_names = ['A', 'B', 'C']
my_df = pd.DataFrame(columns = col_names)
my_df.loc[len(my_df)] = ["++++-= I love Mercedes-Benz", 4, "Love this"]
my_df.loc[len(my_df)] = ["=Looks so good!", 2, "5-year-old"]
my_df
my_df["A"]=my_df["A"].map(lambda x: x[1:] if x.startswith('=') else x)
print(my_df["A"])
I am not sure how to put 4 startswith "-","=","+"," " together and loop them until they meet the first alphabet or character(sometimes it might be in Japanese or Chinese.) 我不确定如何将4个以“-”,“ =”,“ +”,“”开头的字符放在一起,然后循环直到它们遇到第一个字母或字符(有时可能是日语或中文)。
expected final my_df: 预期的最终my_df:
A B C
0 I love Mercedes-Benz 4 Love this
1 Looks so good! 2 5-year-old
You can use str.lstrip
in order to remove these leading characters: 您可以使用str.lstrip
来删除这些前导字符:
my_df.A.str.lstrip('+-=')
0 I love Mercedes-Benz
1 Looks so good!
Name: A, dtype: object
The function startswith
accepts a tuple of prefixes: 函数startswith
接受一个前缀的元组:
while test.startswith(('+','-','=',' ')):
test=test[1:]
But you can't put that in a lambda. 但是您不能将其放在lambda中。 But then, you don't need a lambda: just write the function and pass its name to map
. 但是然后,您不需要lambda:只需编写函数并将其名称传递给map
。
One way to achieve it could be 实现它的一种方法可能是
old = ""
while old != my_df["A"]:
old = my_df["A"]
my_df["A"]=my_df["A"].map(lambda x: x[1:] if any(x.startswith(char) for char in "-=+ ") else x)
But I'd like to warn you about the strip() method for strings: 但我想警告您关于字符串的strip()方法:
>>> test="+++++-= I love Mercedes-Benz"
>>> test.strip("+-=")
' I love Mercedes-Benz'
So your data extraction can become simpler: 因此,您的数据提取可以变得更加简单:
my_df["A"].str=my_df["A"].str.strip("+=- ")
Just be careful because strip will remove the characters from both sides of the string. 请小心,因为strip将从字符串的两侧删除字符。 lstrip
instead can do the job only on the left side. lstrip
只能在左侧执行此工作。
As a lover of regex and possibly convoluted solutions, I will add this solution as well: 作为正则表达式和可能复杂的解决方案的爱好者 ,我还将添加以下解决方案:
import re
my_df["A"]=my_df["A"].map(lambda x: re.sub('^[*-=\s]*', '', x))
the regex reads: 正则表达式为:
^
from the beginning ^
从一开始
[]
items in this group 此群组中的[]
项目
\\s
any whitespace \\s
任何空格
*
zero or more *
零或更多
so this will match (and replace with nothing) all the characters from the beginning of the string that are in the square brackets 因此这将匹配(并且不替换任何内容)字符串开头的所有方括号中的字符
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.