简体   繁体   English

数据清理:如何从pandas数据框列中删除某些值?

[英]Data cleaning: How to remove certain values from a pandas dataframe column?

I am working on the analysis of the user profile interest of a social network. 我正在分析社交网络的用户个人资料兴趣。 I have generated a dataframe with User id, Name and User interest from the export of the social network database. 我从社交网络数据库的导出生成了一个带有用户ID,名称和用户兴趣的数据框。 I was supposed to only get keywords in the 'User interest' column. 我应该只在“用户兴趣”列中获得关键字。 but actually, I got a mix of keywords and User ID... 但实际上,我混合使用了关键字和用户ID ...

    User ID displayName interests
0   5705952d0eb2063205ca1d3c    Jane Catch  []
1   5705e99ac391580e00ea87c9    Heidi Kent  [{u'text': u'psychology', u'_id': {u'$oid': u'...
2   5705efb6c391580e00ea87ca    Rob Tuckinson   [{u'text': u'learning', u'_id': {u'$oid': u'57...

I would like to make some data cleaning on the column interests to only keep the keywords in the user interest column. 我想对列interests进行一些数据清理,以仅将关键字保留在用户兴趣列中。

Today, I have this information: 今天,我有以下信息:

User ID,displayName,interests
"570df0f2a40cc20e00c15e09,Alejandra Zara,""[{u'text': u'pretend-play', u'_id': {u'$oid': u'570e57eba40cc20e00c161ea'}}, {u'text': u'autobiographical-memory', u'_id': {u'$oid': u'570e57eba40cc20e00c161e9'}}]"""

For the first line, I would like only to keep the information below: 对于第一行,我只想保留以下信息:

"570df0f2a40cc20e00c15e09,Alejandra Zara,pretend-play', autobiographical-memory'

Any ideas of data cleaning techniques? 关于数据清理技术有什么想法? Each time, I need to remove the information relative to user ID (different for each row such as: 每次,我需要删除与用户ID相关的信息(每一行都不同,例如:

u'_id': {u'$oid': u'570e57eba40cc20e00c161ea'}}

and remove {u'text': u (which is placed at the beginning of each keyword). 并删除{u'text': u (位于每个关键字的开头)。

If I'm reading the question correctly, what you have in your interests column is the string representation of a Python list of dict s from which you want to get specific values. 如果我正确地阅读了该问题,那么您interests列中的内容就是dict的Python list的字符串表示形式,您想从中获取特定值。 If so, you can use ast.literal_eval to parse it: 如果是这样,则可以使用ast.literal_eval进行解析:

In [24]: df
Out[24]: 
                    User ID     displayName  \
0  570df0f2a40cc20e00c15e09  Alejandra Zara   

                                           interests  
0  [{u'text': u'pretend-play', u'_id': {u'$oid': ...  

In [25]: df['interests'].map(lambda x: ','.join(i['text'] for i in ast.literal_eval(x)))
Out[25]: 
0    pretend-play,autobiographical-memory
Name: interests, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 pandas dataframe 中的列中删除某些字符串 - How to remove certain string from column in pandas dataframe 如何从 pandas dataframe 中删除不在列表中的某些值? - How to remove certain values from a pandas dataframe, which are not in a list? 如何计算来自 pandas dataframe 的特定列的不同值和 plot? - How to count different values and plot the certain column from a pandas dataframe? 如何从 json 格式数据中检索某些键和值作为 Pandas 数据框中的列 - How to retrieve certain keys and values from json format data present as a column in pandas dataframe 如何根据列值的长度从pandas数据帧中删除一行? - How to remove a row from pandas dataframe based on the length of the column values? 如何使用 pandas DataFrame 从数据列中删除不需要的数据 - How to remove unwanted data from a data column using pandas DataFrame 如何从熊猫数据框中删除某些字符 - How to remove certain character from pandas dataframe 从 pandas dataframe 列中删除 dtype 数据 - Remove a dtype data from pandas dataframe column 如何根据 Pandas 中的另一个 DataFrame 更改 DataFrame 特定列中的值 - How to change values in certain column of DataFrame based on another DataFrame in Pandas 从pandas数据帧中的整列中删除某些字符串 - Remove certain string from entire column in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM