简体   繁体   English

熊猫提取和替换值

[英]Pandas Extract and Replace Values

I'm trying to clean a large pandas Dataframe by extracting a name from a text column and replacing the value in another column. 我正在尝试通过从文本列中提取名称并替换另一列中的值来清理大熊猫数据框。 I also only want to replace values where the extraction was successful. 我也只想替换提取成功的值。 I was able to extract the name from the "text" column but struggling to replace the value in the "name" column. 我能够从“文本”列中提取名称,但努力替换“名称”列中的值。 Looking for some suggestions. 寻找一些建议。

Example DF: DF示例:

df = pd.DataFrame({'text': {0: 'John', 1: 'A girl named Susan', 2: 'A man named David'},
                   'name': {0: 'John', 1: 'girl', 2: 'man'}})

                 text  name
0                John  John
1  A girl named Susan  girl
2   A man named David   man

Extracted Names: 提取的名称:

print(df['text'].str.extract('((?<=named\s)\w+)'))

       0
0    NaN
1  Susan
2  David

Desired Output: 所需输出:

                 text   name
0                John   John
1  A girl named Susan  Susan
2   A man named David  David

Not sure if possible with real data, but one solution is replace missing values by original values of name column: 不确定是否可以使用真实数据,但是一种解决方案是用name列的原始值替换缺少的值:

df['name'] = df['text'].str.extract('((?<=named\s)\w+)').fillna(df['name'])
print (df)
                 text   name
0                John   John
1  A girl named Susan  Susan
2   A man named David  David

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM