有没有办法通过将列值与列表中的值进行比较来从 dataframe 中删除行？

Question

I have a dataframe with three columns.我有一个三列的 dataframe。 I am trying to clean the data by dropping all the rows that do not have country names in the third column.我试图通过删除第三列中没有国家名称的所有行来清理数据。 Or basically I am trying to drop all the values in that third column that are not countries.或者基本上我试图删除第三列中不是国家的所有值。

For that, I added a list of the country names to my notebook and now I would like to know if it is possible to drop all the values in that column that are not found in the list of countries?为此，我在笔记本中添加了国家名称列表，现在我想知道是否可以删除该列中未在国家列表中找到的所有值？

The problem might be, that the values in the column are one big string of text from a product text of an HTML file.问题可能是，列中的值是来自 HTML 文件的产品文本的一大串文本。 I already split the strings and dropped a couple of rows based on ";"我已经根据“;”拆分了字符串并删除了几行and length, but now I am not sure how to continue.和长度，但现在我不知道如何继续。

I tried:我试过了：

ProductDataFrame =
ProductDataFrame[~ProductDataFrame['Produkttext'].isin(CountriesList)]

which doesn't return an error but does not change anything in my dataframe...它不会返回错误，但不会更改我的 dataframe 中的任何内容...

This is what it looks like:这是它的样子：

                                                                  Produkttext  
1                                                             Roter Kopfsalat.   
2                                                                       Italien  
3                                 Äthiopien,Marokko, Senegal, Ruanda oder Kenia  
4                                                                       Spanien  
5                                                                   Deutschland  
6                                           Deutschland, Niederlande oder Polen  
7                                                      Deutschland oder Italien  
8                                          Deutschland, Frankreich oder Italien  
9                                                                   Deutschland  
10                                                  Deutschland oder Österreich

Answer 1

After you split by , , you can use explode to flatten out all the sublists, then use isin .按,拆分后，您可以使用explode来展平所有子列表，然后使用isin 。 When you want to group them back together into lists, use .groupby(level=0) (which groups by the 0th level of the index) and .agg(list) :当您想将它们重新组合成列表时，请使用.groupby(level=0) （按索引的第 0 级分组）和.agg(list) ：

e = df['Produkttext'].str.split(r'[\s,]+').explode()

# Remove "oder"
e = e[e.ne('oder')]

# Keep only the items that are in CountriesList
e = e[e.isin(CountriesList)]

# Convert the remaining items back to lists
e.groupby(level=0).agg(list)

有没有办法通过将列值与列表中的值进行比较来从 dataframe 中删除行？

问题描述

1 个解决方案

解决方案1
0 2022-02-01 22:15:19

有没有办法通过将列值与列表中的值进行比较来从 dataframe 中删除行？

问题描述

1 个解决方案

解决方案1 0 2022-02-01 22:15:19

解决方案1
0 2022-02-01 22:15:19