简体   繁体   English

用反斜杠熊猫分隔列值

[英]Separate column values by backslash pandas

I have a dataframe like this:我有一个这样的数据框:

data = {'id': [1,1,1,2,2],
        'value': ['red','red\blue','yellow','oak','oak\wood']
}
df = pd.DataFrame (data, columns = ['id','value'])

What I want is:我想要的是:

id value   count
1  red     2
1  blue    1
1  yellow  1
2  oak     2
2  wood    1

If it's other delimiters like ;如果是其他分隔符,如; and / i can do:/我可以做到:

df1 = (df.assign(value = df['value'].str.split(';|/'))
         .explode('value')
         .groupby(['id','value'], sort=False)
         .size()
         .reset_index(name='count'))

But when it's backslash \\ it doesn't work.但是当它是反斜杠\\它不起作用。

What should I do?我该怎么办?

You can replace all non-alphanumeric characters from your value and then do a split您可以从您的值中替换所有非字母数字字符,然后进行拆分

df1 = (df.assign(value = df['value'].replace({r'\W': ' '}, regex=True).str.split())
     .explode('value')
     .groupby(['id','value'], sort=False)
     .size()
     .reset_index(name='count'))

NOTE : This will fail if there are other symbols that are not needed for value split.注意:如果存在不需要进行值拆分的其他符号,这将失败。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM