无法正确解析字符串以删除特殊字符

Question

I have one column of a df, which contains strings, which I wish to parse:我有一列 df，其中包含我希望解析的字符串：

df = pd.DataFrame({'name':'apple banana orange'.split(), 'size':"2'20 12:00 456".split()})

which gives这使

I wish to remove all ' characters, remove :\\d\\d and preserve the pure integers, such that the results looks like as follows:我希望删除所有 ' 字符，删除 :\\d\\d 并保留纯整数，结果如下所示：

I have tried to extract the integers prior to ':' and filling the NaN with the original data.我试图在 ':' 之前提取整数并用原始数据填充 NaN。 While this works for the first row (preserving the original data) and for the second row (correctly removes the ' character), for the last row it somehow casts the data of the first row.虽然这适用于第一行（保留原始数据）和第二行（正确删除 ' 字符），但对于最后一行，它以某种方式转换了第一行的数据。 My code is我的代码是

df['size'] = df['size'].str.extract('(\\d*):').fillna(df['size'])

Answer 1

If you only need to test for the ' and the : in the time stamp this will do the job:如果您只需要在时间戳中测试'和:这将完成这项工作：

df["size"] = df["size"].str.replace("'", "").str.split(":").map(lambda x: x[0])

Output:输出：

     name size
0   apple  220
1  banana   12
2  orange  456

Answer 2

如果我错了，请纠正我，但你不能做.replace('character', '')吗？

Answer 3

Try this...试试这个...

df['size'] = df['size'].str.replace(r"'", '').str.replace(r'((\d{2}):\d{2})', r'\2', regex=True)

Outputs:输出：

    name    size
0   apple   220
1   banana  12
2   orange  456

无法正确解析字符串以删除特殊字符

问题描述

3 个解决方案

解决方案1
1 已采纳 2021-07-18 18:58:33

解决方案2
0 2021-07-18 19:00:59

解决方案3
0 2021-07-18 19:01:54

无法正确解析字符串以删除特殊字符

问题描述

3 个解决方案

解决方案1 1 已采纳 2021-07-18 18:58:33

解决方案2 0 2021-07-18 19:00:59

解决方案3 0 2021-07-18 19:01:54

解决方案1
1 已采纳 2021-07-18 18:58:33

解决方案2
0 2021-07-18 19:00:59

解决方案3
0 2021-07-18 19:01:54