简体   繁体   English

有没有办法通过将列值与列表中的值进行比较来从 dataframe 中删除行?

[英]Is there a way to drop rows from a dataframe by comparing a column value to values in list?

I have a dataframe with three columns.我有一个三列的 dataframe。 I am trying to clean the data by dropping all the rows that do not have country names in the third column.我试图通过删除第三列中没有国家名称的所有行来清理数据。 Or basically I am trying to drop all the values in that third column that are not countries.或者基本上我试图删除第三列中不是国家的所有值。

For that, I added a list of the country names to my notebook and now I would like to know if it is possible to drop all the values in that column that are not found in the list of countries?为此,我在笔记本中添加了国家名称列表,现在我想知道是否可以删除该列中未在国家列表中找到的所有值?

The problem might be, that the values in the column are one big string of text from a product text of an HTML file.问题可能是,列中的值是来自 HTML 文件的产品文本的一大串文本。 I already split the strings and dropped a couple of rows based on ";"我已经根据“;”拆分了字符串并删除了几行and length, but now I am not sure how to continue.和长度,但现在我不知道如何继续。

I tried:我试过了:

ProductDataFrame =
ProductDataFrame[~ProductDataFrame['Produkttext'].isin(CountriesList)]

which doesn't return an error but does not change anything in my dataframe...它不会返回错误,但不会更改我的 dataframe 中的任何内容...

This is what it looks like:这是它的样子:

                                                                  Produkttext  
1                                                             Roter Kopfsalat.   
2                                                                       Italien  
3                                 Äthiopien,Marokko, Senegal, Ruanda oder Kenia  
4                                                                       Spanien  
5                                                                   Deutschland  
6                                           Deutschland, Niederlande oder Polen  
7                                                      Deutschland oder Italien  
8                                          Deutschland, Frankreich oder Italien  
9                                                                   Deutschland  
10                                                  Deutschland oder Österreich

After you split by , , you can use explode to flatten out all the sublists, then use isin .,拆分后,您可以使用explode来展平所有子列表,然后使用isin When you want to group them back together into lists, use .groupby(level=0) (which groups by the 0th level of the index) and .agg(list) :当您想将它们重新组合成列表时,请使用.groupby(level=0) (按索引的第 0 级分组)和.agg(list)

e = df['Produkttext'].str.split(r'[\s,]+').explode()

# Remove "oder"
e = e[e.ne('oder')]

# Keep only the items that are in CountriesList
e = e[e.isin(CountriesList)]

# Convert the remaining items back to lists
e.groupby(level=0).agg(list)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Dataframe 将多个列值与一个值进行比较后删除行 - Python Dataframe delete rows after comparing multiple column values with a value 如何根据特定列中的空值从数据框中删除行? - How to drop rows from a dataframe as per null values in a specific column? 从 dataframe 中删除值在整列中仅出现一次的行 - Drop rows from dataframe with the values occuring only once in the whole column 按列值删除 Pandas DataFrame 中的行(文本) - Drop rows in Pandas DataFrame by Column values (text) 如何通过列值的条件删除 DataFrame 中的行 - How to Drop rows in DataFrame by conditions on column values 将数据框列表中的值与另一个数据框列进行比较 - Comparing values in a list of dataframes to another dataframe column 通过比较时间戳删除 dataframe 行 - Drop dataframe rows by comparing timestamps Pandas - 如果列值在列表 (.csv) 中,则从 dataframe 删除行 - Pandas - Drop lines from dataframe if column value is in list (.csv) 仅当新值不为空时,如何从 dataframe 比较来自另一个 dataframe 的值来更改列值? - How to change a column values from dataframe comparing value from another dataframe only if the new value is not empty? 将来自一个数据帧的值与来自另一个数据帧中的列的值进行比较,并从第三列获取数据 - Comparing a value from one dataframe with values from columns in another dataframe and getting the data from third column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM