[英]How to update substring in pandas dataframe column of strings
I have a dataframe ('sp500news') which looks like the following: 我有一个数据框('sp500news'),看起来像下面的样子:
date_publish \
79944 2007-01-29 19:08:35
181781 2007-12-14 19:39:06
213175 2008-01-22 11:17:19
93554 2008-01-22 18:52:56
...
title
79944 Microsoft Vista corporate sales go very well
181781 Williams No Anglican consensus on Episcopal Church
213175 CSX quarterly profit rises
93554 Citigroup says 30 bln capital helps exceed target
...
I am trying to update each company name with its corresponding ticker from a the 'symbol' column of df ('constituents') which looks like: 我正在尝试通过df的“符号”列(“构成要素”)中的相应代码更新每个公司名称,如下所示:
Symbol Name Sector
0 MMM 3M Industrials
1 AOS A.O. Smith Industrials
2 ABT Abbott Health Care
3 ABBV AbbVie Health Care
...
116 C Citigroup Financials
...
I've already tried: 我已经尝试过:
for item in sp500news['title']:
for word in item:
if word in constituents['Name']:
indx = constituents['Name'].index(word)
str.replace(word, constituents['Symbol'][indx])
Try this: 尝试这个:
Here are the dummy dataframes which represent your data 这是代表您的数据的虚拟数据框
df1 = pd.DataFrame({'Symbol': ['MV', 'AOS','ABT'],
'Name': ['Microsoft Vista', 'A.0.', 'Abbot']})
df1
Symbol Name
0 MV Microsoft Vista
1 AOS A.0.
2 ABT Abbot
df2 = pd.DataFrame({'title': [79944, 181781, 213175],
'comment': ['Microsoft Vista corporate sales go very well',
'Abbot consensus on Episcopal Church',
'A.O. says 30 bln captial helps exceed target']})
title comment
0 79944 Microsoft Vista corporate sales go very well
1 181781 Abbot consensus on Episcopal Church
2 213175 A.O. says 30 bln captial helps exceed target
Make a dictionary of values mapping names to their respective symbols 制作一个将名称映射到其各自符号的值的字典
rep = dict(zip(df1.Name,df1.Symbol))
rep
{'Microsoft Vista': 'MV', 'A.0.': 'AOS', 'Abbot': 'ABT'}
Replace them using the Series.replace method 使用Series.replace方法替换它们
df2['comment'] = df2['comment'].replace(rep, regex = True)
df2
title comment
0 79944 MV corporate sales go very well
1 181781 ABT consensus on Episcopal Church
2 213175 A.O. says 30 bln captial helps exceed target
try the following code 试试下面的代码
df = pd.DataFrame({'title': ['Citigroup says 30 bln capital helps exceed target',
'Williams No Anglican consensus on Episcopal Church',
'Microsoft Vista corporate sales go very well']})
constituents = pd.DataFrame({'symbol': ['MMM', 'C', 'MCR', 'WLM'],
'name': ['3M', 'Citigroup', 'Microsoft', 'Williams']})
for name, symbol in zip(constituents['name'], constituents['symbol']):
df['title'] = df['title'].str.replace(name, symbol)
Output 产量
title
0 C says 30 bln capital helps exceed target
1 WLM No Anglican consensus on Episcopal Church
2 MCR Vista corporate sales go very well
I basically just copied a few rows of your sp500news['title]
and made up some of constituents['Name']
just to demonstrate the transformation. 我基本上只是复制了
sp500news['title]
的几行,并组成了一些constituents['Name']
只是为了演示转换。 Essentially, I am accessing the string method object of the pd.Series
object of column title
from sp500news
, so then I can apply replace
to it when it finds the matching company name. 本质上,我是从
sp500news
访问列title
的pd.Series
对象的字符串方法对象,因此当找到匹配的公司名称时,可以对其应用replace
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.