[英]Pandas Extract and Replace Values
I'm trying to clean a large pandas Dataframe by extracting a name from a text column and replacing the value in another column. 我正在尝试通过从文本列中提取名称并替换另一列中的值来清理大熊猫数据框。 I also only want to replace values where the extraction was successful.
我也只想替换提取成功的值。 I was able to extract the name from the "text" column but struggling to replace the value in the "name" column.
我能够从“文本”列中提取名称,但努力替换“名称”列中的值。 Looking for some suggestions.
寻找一些建议。
Example DF: DF示例:
df = pd.DataFrame({'text': {0: 'John', 1: 'A girl named Susan', 2: 'A man named David'},
'name': {0: 'John', 1: 'girl', 2: 'man'}})
text name
0 John John
1 A girl named Susan girl
2 A man named David man
Extracted Names: 提取的名称:
print(df['text'].str.extract('((?<=named\s)\w+)'))
0
0 NaN
1 Susan
2 David
Desired Output: 所需输出:
text name
0 John John
1 A girl named Susan Susan
2 A man named David David
Not sure if possible with real data, but one solution is replace missing values by original values of name
column: 不确定是否可以使用真实数据,但是一种解决方案是用
name
列的原始值替换缺少的值:
df['name'] = df['text'].str.extract('((?<=named\s)\w+)').fillna(df['name'])
print (df)
text name
0 John John
1 A girl named Susan Susan
2 A man named David David
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.