简体   繁体   English

过滤 dataframe 中的特殊字符

[英]filter special characters in a dataframe

I have the following dataframe called data :我有以下 dataframe 称为data

    metrics    artists

0    0.21    ['Zhané']
2    0.14    ['Mose Allison']
3    0.87    ['水柳仙']
4    0.25    ['Shel Silverstein']

Some records of the column "artists" have special characters, I want to make another df with the records that have special characters, that is, the following output: “艺术家”列的一些记录有特殊字符,我想用有特殊字符的记录再做一个df,即以下output:

data:数据:

     metrics    artists

0    0.14    ['Mose Allison']
1    0.25    ['Shel Silverstein']

data2:数据2:

     metrics    artists

0    0.21    ['Zhané']
1    0.14    ['水柳仙']

use:利用:

 data2=data.artists[data.artists.str.contains("[^a-zA-Z0-9]")]

but I get the original df,但我得到了原始的df,

I also tried with:我也尝试过:

data2 = []
for x in data['artists']:
    if x is not "[^a-zA-Z0-9 ]":
         data2[x]=data[x]
    print(data2)

but it gives me the error:但它给了我错误:

KeyError: "['Zhané']"

and with:与:

if x is "[^ a-zA-Z0-9]"

returns empty records.返回空记录。

use:利用:

data2=data.artists[data.artists.str.contains("[^a-zA-Z0-9]")] data2=data.artists[data.artists.str.contains("[^a-zA-Z0-9]")]

but I get the original df,但我得到了原始的df,

You're missing a space in "[^a-zA-Z0-9]" which is why you're getting the original df.您在“[^a-zA-Z0-9]”中缺少一个空格,这就是您获得原始 df 的原因。 Tested with Python3 in a Jupyter notebook.在 Jupyter 笔记本中使用 Python3 进行了测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM