[英]Fastest Way to Iterate Over Dataframe Column to Find Match in Strings
Here's a very truncated extract from a large dataframe:这是来自大型数据帧的非常截断的摘录:
name![]() |
age![]() |
city![]() |
---|---|---|
ben![]() |
66 ![]() |
NY![]() |
rob![]() |
45 ![]() |
LON![]() |
james![]() |
22 ![]() |
LA![]() |
I also have a numerous strings that each contain different words but will contain one (not more) of the values in the name column.我还有许多字符串,每个字符串都包含不同的单词,但会包含 name 列中的一个(不是更多)值。
For example:例如:
For each string I want to iterate over the "name" column to find the name that matches the name in the string and return the age of the person.对于每个字符串,我想遍历“名称”列以查找与字符串中的名称匹配的名称并返回该人的年龄。
So in the first example the desired result is 45 and in the second example the desired result is 66.所以在第一个例子中,期望的结果是 45,而在第二个例子中,期望的结果是 66。
I am new to Pandas and am struggling.我是 Pandas 的新手,正在苦苦挣扎。 Can anyone point me in the right direction?
谁能指出我正确的方向?
Hope this helps:希望这可以帮助:
List of all strings.所有字符串的列表。 This can be part of another dataframe.
这可以是另一个数据框的一部分。 Just select the column where this values are & convert it to list.
只需选择此值所在的列并将其转换为列表。
l = ['rob was born in London', "ben once lived in New York"]
The dataframe from your example您示例中的数据框
df = pd.DataFrame({'name': ['ben', 'rob', 'james'],
'age': [66, 45, 22],
'city': ['NY', 'LON', 'LA']})
Final dataset where string
& age
exist.存在
string
和age
的最终数据集。
age_dat = pd.DataFrame()
The first for-loop, loops over names from your original ( df
) dataset.第一个 for 循环遍历原始 (
df
) 数据集中的名称。 The second for-loop loops over list of sentences
(list l
).第二个 for 循环遍历
sentences
列表(列表l
)。 If any name is found in l
, it gets appended in age_dat
.如果在
l
中找到任何名称,则将其附加到age_dat
中。
for x in list(df.name):
for z in l:
if x in z:
dat = pd.DataFrame()
dat['string']=[z]
dat['age'] = [df[df['name']==x].age.tolist()[0]]
age_dat = age_dat.append(dat)
print(age_dat)
string age
0 ben once lived in New York 66
0 rob was born in London 45
Data数据
s = pd.Series(['rob was born in London', "ben once lived in New York"])
df = pd.DataFrame({'name': ['ben', 'rob', 'james'],
'age': [66, 45, 22],
'city': ['NY', 'LON', 'LA']})
Solution解决方案
who = s.str.extract('(' + ')|('.join(df.name) + ')').bfill(axis=1)[0]
age_by_name = dict(zip(df.name, df.age))
pd.DataFrame({'text': s, 'age': who.map(age_by_name)})
text age
0 rob was born in London 45
1 ben once lived in New York 66
Explanation解释
Use .str.extract
to get the name in the string and then match it with the dataframe to get the age.使用
.str.extract
获取字符串中的名称,然后将其与数据框匹配以获取年龄。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.