简体   繁体   English

转换火花dataframe柱

[英]Convert spark dataframe column

I have a spark dataframe with two columns;我有一个两列的火花 dataframe; Stars (numeric value), and categories (string of tags, eg.: "Restaurant, Italien, High-end").星号(数值)和类别(标签字符串,例如:“Restaurant、Italien、High-end”)。 I wish to recreate the dataframe so that categories is instead a count of tags.我希望重新创建 dataframe 以便类别是标签的计数。 in the above example categories would instead become 3.在上面的示例中,类别将改为 3。

I've tried treating the dataframe as a pandas dataframe, but it does not seem to work.我尝试将 dataframe 视为 pandas dataframe,但它似乎不起作用。 I am new to Spark so perhaps it is because i don't really grasp the idea of RDD's.我是 Spark 的新手,所以也许是因为我并没有真正掌握 RDD 的概念。

Please paste the code for better understanding of problem.请粘贴代码以更好地理解问题。 Based on the description of your problem, you can try something like this:根据您对问题的描述,您可以尝试以下操作:

df['CategoryCount'] = df['categories'].str.split(',').str.len()

df is your original dataframe and CategoryCount is new column which contains the count of tags. df 是您原来的 dataframe 和 CategoryCount 是包含标签计数的新列。 You can drop your categories column also if you want.如果需要,您也可以删除您的类别列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM