简体   繁体   English

Pandas 通过从另一个数据帧的 1 列中的单元格检查列表中返回匹配字符串的行来创建新的数据帧

[英]Pandas create a new dataframe by returning the rows matching strings from a list checked against cells in 1 column from an another dataframe

I have created a dataframe from a .csv with just over 3.8 million rows:我从 .csv 创建了一个数据框,其中包含超过 380 万行:

import pandas as pd
import csv

file_name = 'bigfile.csv'
bigfile_df = pd.read_csv (file_name, low_memory=False)

I am then importing a second csv which I would like to be my list:然后我要导入第二个 csv,我想将其作为我的列表:

input_df = pd.read_csv('list.csv', delimiter=',')

Then converting this to a List:然后将其转换为列表:

l = input_df['Column_Name'].tolist()

Which when printed looks like:打印出来的样子:

['Text Text Text', 'Text Text Text', 'Text Text Text']

The list is fairly large as well containing over 12,000 rows and contains strings.该列表相当大,也包含超过 12,000 行并包含字符串。

What I would like to do is take each entry within the list and check for any matches within cells from a column ['Name'] in the bigfile_df and create a new dataframe with the entire row of these matches.我想要做的是获取列表中的每个条目并检查 bigfile_df 中列 ['Name'] 中单元格内的任何匹配项,并创建一个包含这些匹配项的整行的新数据框。

I hope this all makes sense, I have looked for similar examples to try and answer this but could not find.我希望这一切都有意义,我已经寻找类似的例子来尝试回答这个问题,但找不到。 Thank you in advance for any replies.预先感谢您的任何答复。

You can achieve this using the query method:您可以使用查询方法实现此目的:

output_dfs = []
for entry in l:
    output_dfs.append(bigfile_df.query('Name == @entry')

Edit: I may have misunderstood, if you want one dataframe you can do as follows:编辑:我可能误解了,如果你想要一个数据框,你可以这样做:

output_df = bigfile_df.query('Name in @l')

This will search all rows of bigfile_df for rows which have the 'Name' column value appearing in your list l .这将在bigfile_df所有行中搜索列表l出现 'Name' 列值的行。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas 数据框根据另一列的条件创建新行 - Pandas dataframe create new rows based on condition from another column 根据来自另一个熊猫数据框的列在熊猫数据框中创建新行 - Create new rows in a Pandas Dataframe based on a column from another pandas dataframe 熊猫:在一个数据框中创建新列,并根据与另一个数据框中的匹配键进行匹配 - Pandas: create new column in one dataframe with values based on matching key from another dataframe 使用来自另一个数据帧的 if 条件在 Pandas 数据帧中创建一个新列 - create a new column in pandas dataframe using if condition from another dataframe 如果来自另一个 dataframe 的列和来自原始 dataframe 的列具有匹配值,则在原始 dataframe 中创建一个新列 - Create a new column in the original dataframe if the column from another dataframe and a column from original dataframe have matching values python pandas dataframe从其他列的单元格创建新列 - python pandas dataframe create new column from other columns' cells 基于匹配来自另一个数据帧pandas的值的新列 - New column based on matching values from another dataframe pandas 使用if语句针对另一列在pandas数据框中创建新列 - Create new column in pandas dataframe using if statement against another column Pandas dataframe-创建新的列表列,其中包含来自分组列的字符串聚合 - Pandas dataframe- create new list column consisting of aggregation of strings from grouped column pandas:通过将 DataFrame 行与另一个 DataFrame 的列进行比较来创建新列 - pandas: Create new column by comparing DataFrame rows with columns of another DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM