[英]Is there a way to drop rows from a dataframe by comparing a column value to values in list?
I have a dataframe with three columns.我有一个三列的 dataframe。 I am trying to clean the data by dropping all the rows that do not have country names in the third column.我试图通过删除第三列中没有国家名称的所有行来清理数据。 Or basically I am trying to drop all the values in that third column that are not countries.或者基本上我试图删除第三列中不是国家的所有值。
For that, I added a list of the country names to my notebook and now I would like to know if it is possible to drop all the values in that column that are not found in the list of countries?为此,我在笔记本中添加了国家名称列表,现在我想知道是否可以删除该列中未在国家列表中找到的所有值?
The problem might be, that the values in the column are one big string of text from a product text of an HTML file.问题可能是,列中的值是来自 HTML 文件的产品文本的一大串文本。 I already split the strings and dropped a couple of rows based on ";"我已经根据“;”拆分了字符串并删除了几行and length, but now I am not sure how to continue.和长度,但现在我不知道如何继续。
I tried:我试过了:
ProductDataFrame =
ProductDataFrame[~ProductDataFrame['Produkttext'].isin(CountriesList)]
which doesn't return an error but does not change anything in my dataframe...它不会返回错误,但不会更改我的 dataframe 中的任何内容...
This is what it looks like:这是它的样子:
Produkttext
1 Roter Kopfsalat.
2 Italien
3 Äthiopien,Marokko, Senegal, Ruanda oder Kenia
4 Spanien
5 Deutschland
6 Deutschland, Niederlande oder Polen
7 Deutschland oder Italien
8 Deutschland, Frankreich oder Italien
9 Deutschland
10 Deutschland oder Österreich
After you split by ,
, you can use explode
to flatten out all the sublists, then use isin
.按,
拆分后,您可以使用explode
来展平所有子列表,然后使用isin
。 When you want to group them back together into lists, use .groupby(level=0)
(which groups by the 0th level of the index) and .agg(list)
:当您想将它们重新组合成列表时,请使用.groupby(level=0)
(按索引的第 0 级分组)和.agg(list)
:
e = df['Produkttext'].str.split(r'[\s,]+').explode()
# Remove "oder"
e = e[e.ne('oder')]
# Keep only the items that are in CountriesList
e = e[e.isin(CountriesList)]
# Convert the remaining items back to lists
e.groupby(level=0).agg(list)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.