简体   繁体   English

无法正确解析字符串以删除特殊字符

[英]Cannot parse strings correctly to remove special characters

I have one column of a df, which contains strings, which I wish to parse:我有一列 df,其中包含我希望解析的字符串:

df = pd.DataFrame({'name':'apple banana orange'.split(), 'size':"2'20 12:00 456".split()})

which gives这使在此处输入图片说明

I wish to remove all ' characters, remove :\\d\\d and preserve the pure integers, such that the results looks like as follows:我希望删除所有 ' 字符,删除 :\\d\\d 并保留纯整数,结果如下所示:

在此处输入图片说明

I have tried to extract the integers prior to ':' and filling the NaN with the original data.我试图在 ':' 之前提取整数并用原始数据填充 NaN。 While this works for the first row (preserving the original data) and for the second row (correctly removes the ' character), for the last row it somehow casts the data of the first row.虽然这适用于第一行(保留原始数据)和第二行(正确删除 ' 字符),但对于最后一行,它以某种方式转换了第一行的数据。 My code is我的代码是

df['size'] = df['size'].str.extract('(\\d*):').fillna(df['size'])

在此处输入图片说明

If you only need to test for the ' and the : in the time stamp this will do the job:如果您只需要在时间戳中测试':这将完成这项工作:

df["size"] = df["size"].str.replace("'", "").str.split(":").map(lambda x: x[0])

Output:输出:

     name size
0   apple  220
1  banana   12
2  orange  456

如果我错了,请纠正我,但你不能做.replace('character', '')吗?

Try this...试试这个...

df['size'] = df['size'].str.replace(r"'", '').str.replace(r'((\d{2}):\d{2})', r'\2', regex=True)

Outputs:输出:

    name    size
0   apple   220
1   banana  12
2   orange  456

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM