简体   繁体   English

遍历两个熊猫数据帧并从df1中找到一个字符串,其中a在df2中

[英]iterating through two pandas data frames and finding a string from df1 that a is in df2

I have two Dataframe's, let's call them df1 and df2. 我有两个Dataframe,我们称它们为df1和df2。

df1 df1

Term Served term1 82321 term2 54232 term3 34323 term4 1231

df2 df2

Full Term clicks this is term1 233 oh boy this is term2 122 yea that's right term1 1121 oh no not that term4 313123

I would like to go row by row and find every time that the terms in df1 appear in df2. 我想逐行查找df1中的字词每次出现在df2中。 After that I would like to sum all of the clicks for that specific term. 之后,我想总结该特定字词的所有点击。 The out put would look like, 输出结果看起来像

Term Served Clicks term1 82321 1354 term2 54232 122 term3 34323 0 term4 1231 313123

Here is what I have so far. 这是我到目前为止所拥有的。 I haven't gotten past grabing all of the times that the terms in df1 appear in df2. 我一直没有抓住df1中的术语出现在df2中的所有时间。 The code below keeps looping through only the first row in df1. 下面的代码仅使循环遍历df1中的第一行。 Maybe I am not understanding the str.findall() or I have my loops wrong. 也许我不了解str.findall()或者我的循环错了。

for index, row in df1.iterrows(): for row2 in df2.iteritems(): full_headline = df2['Full Term'].str.findall(row[0]) print(full_headline)

IIUC using str.findall extact the Term in df2 from df1 , then we need gourpby sum the common Term in df2 .So far right now ,we only need assign the result back to df1 using map IIUC使用str.findall从df1 str.findall df2中的Term,那么我们需要gourpby将df2中的公共Term sum 。到目前为止,我们只需要使用map将结果分配回df1

df2['Full Term']=df2['Full Term'].str.findall('|'.join(df1.Term)).str[0]
s=df2.groupby('Full Term').clicks.sum()
df1['Clicks']=df1.Term.map(s).fillna(0)
df1
Out[114]: 
    Term  Served    Clicks
0  term1   82321    1354.0
1  term2   54232     122.0
2  term3   34323       0.0
3  term4    1231  313123.0

Update if that is the case you may want to see unnesting after str.findall 更新如果是这样的话,你可能希望看到unnestingstr.findall

df2['Full Term']=df2['Full Term'].str.findall('|'.join(df1.Term))
df2=df2[df2['Full Term'].astype(bool)].copy()#adding here

def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')
s=unnesting(df2,['Full Term']).groupby('Full Term').clicks.sum()
df1['Clicks'] = df1.Term.map(s).fillna(0)
df1
Out[137]: 
    Term  Served  Clicks
0  term1   82321    1354
1  term2   54232     355
2  term3   34323     233
3  term4    1231  313123

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果找到匹配项,则比较两个不同数据框中的列,将电子邮件从df2复制到df1 - Compare columns in two different data frames if match found copy email from df2 to df1 如何将两个不同的数据框 df1 df2 与特定列(列 w)进行比较,并从 df2 更新 df1 中匹配的行列 AD - how to compare two different data frames df1 df2 with specific column ( column w) and update the matched rows column AD in df1 from df2 如果两个不同数据帧中两列的值匹配,则将df2中另一列的值复制到df1中的列 - If values from two columns in two different data frames match then copy values from another column in df2 to column in df1 熊猫:如果df2的字符串中存在df1中的子字符串,则联接两个数据帧(如果字符串包含子字符串) - Pandas: Join two dataframes if substring in df1 exists in string of df2 (if string contains substring) Python Pandas查找并替换df2中的df1值 - Python Pandas lookup and replace df1 value from df2 我有两个数据框 df1 和 df2,我需要使用 df2 中的键过滤掉 df1,使用 df2 中的开始和结束日期,我需要得到像 df3 这样的结果 - I have two data frames df1 and df2, I need to filter out df1 using keys in df2 using start and end dates in df2, I need to get a result like df3 如果 df2 中不存在,则从 df1 获取数据 - getting data from df1 if it doesn't exist in df2 pandas 如何从 df2 获取 df1 的值,而 df1 和 df2 的值在列上重叠 - pandas how to get values from df2 for df1 while df1 and df2 have values overlapped on column(s) 如果值在 df1 的范围内,Pandas 会将标签从 df1 应用到 df2 - Pandas apply labels from df1 to df2 if values are within a range in df1 Pandas - 匹配两个数据帧中的两列,并在df1中创建新列 - Pandas - match two columns from two data frames and create new column in df1
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM