[英]Cannot parse strings correctly to remove special characters
I have one column of a df, which contains strings, which I wish to parse:我有一列 df,其中包含我希望解析的字符串:
df = pd.DataFrame({'name':'apple banana orange'.split(), 'size':"2'20 12:00 456".split()})
I wish to remove all ' characters, remove :\\d\\d and preserve the pure integers, such that the results looks like as follows:我希望删除所有 ' 字符,删除 :\\d\\d 并保留纯整数,结果如下所示:
I have tried to extract the integers prior to ':' and filling the NaN with the original data.我试图在 ':' 之前提取整数并用原始数据填充 NaN。 While this works for the first row (preserving the original data) and for the second row (correctly removes the ' character), for the last row it somehow casts the data of the first row.虽然这适用于第一行(保留原始数据)和第二行(正确删除 ' 字符),但对于最后一行,它以某种方式转换了第一行的数据。 My code is我的代码是
df['size'] = df['size'].str.extract('(\\d*):').fillna(df['size'])
If you only need to test for the '
and the :
in the time stamp this will do the job:如果您只需要在时间戳中测试'
和:
这将完成这项工作:
df["size"] = df["size"].str.replace("'", "").str.split(":").map(lambda x: x[0])
Output:输出:
name size
0 apple 220
1 banana 12
2 orange 456
如果我错了,请纠正我,但你不能做.replace('character', '')
吗?
Try this...试试这个...
df['size'] = df['size'].str.replace(r"'", '').str.replace(r'((\d{2}):\d{2})', r'\2', regex=True)
Outputs:输出:
name size
0 apple 220
1 banana 12
2 orange 456
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.