简体   繁体   English

我需要删除某一列中没有值或为“null”的所有行:使用 Python 和 Pandas

[英]I need to drop all rows in a certain column where there is no value or is “null”: Using Python and Pandas

I need to drop all rows in a certain column where there is no value ie where it is "null".我需要删除某个列中没有值的所有行,即它是“null”的地方。 But the problem is that I do not know the name of the column.但问题是我不知道列的名称。 But know that it is the 5th column across so I have tired using some iloc methods like "notna" and "notnull"(see below).但是知道它是第 5 列,所以我已经厌倦了使用一些 iloc 方法,如“notna”和“notnull”(见下文)。 I have included a sample image of the type of data I am working with.我已经包含了我正在使用的数据类型的示例图像。 The reason I am trying to do this is because there is a varying number of junk rows at the top of my csv file/dataframe that I am trying to get rid of.我试图这样做的原因是因为在我试图摆脱的 csv 文件/数据帧的顶部有不同数量的垃圾行。 But the number of rows is different each time so I cannot use something that will just drop a certain known number of header rows.但是每次的行数都不同,所以我不能使用只会删除某个已知数量的 header 行的东西。 That is why I am trying to get rid of all null rows in a certain column because I know that it will also get rid of all the junk rows at the top of my dataset.这就是为什么我试图删除某个列中的所有 null 行,因为我知道它也会删除数据集顶部的所有垃圾行。

These are some methods I have tried using but they didn't work.这些是我尝试使用的一些方法,但没有奏效。

df = df[df[df.iloc[:, 4]].notna()]

df = df[pd.notnull(df[df.iloc[:, 4])]

df = df.dropna(subset=[df.iloc[:, 5]])

So for example here in this image I am trying to drop all rows where column 5 (the Date column) is null but that columns name is not "Date" yet because of the junk rows at the top.因此,例如,在此图像中,我试图删除第 5 列(日期列)为 null 但列名称不是“日期”的所有行,因为顶部有垃圾行。 I am trying to get rid of all the null rows in column 5 so that only the populated columns remain and the junk rows at the top will be eliminated:我正在尝试删除第 5 列中的所有 null 行,以便只保留填充的列,并消除顶部的垃圾行:

See the table here请参阅此处的表格

在此处输入图像描述

Your first two versions have an extra df[] .您的前两个版本有一个额外的df[] You can use either:您可以使用:

df = df[df.iloc[:, 4].notna()]

Or:或者:

df = df[pd.notnull(df.iloc[:, 4])]

To break it down more explicitly, these are using boolean indexing.为了更明确地分解它,这些使用 boolean 索引。 For example the first one uses df.iloc[:, 4].notna() to get a boolean index of notna and then filters df with it:例如,第一个使用df.iloc[:, 4].notna()获取 notna 的notna索引,然后用它过滤df

notna_boolean_index = df.iloc[:, 4].notna()
df = df.loc[notna_boolean_index] # can also leave out `.loc` for boolean indexes

You can simply parse your data by passing na_values and then do drop_na .您可以通过传递na_values来简单地解析您的数据,然后执行drop_na To handle the junk rows at the top you can use skiprows while reading the csv.要处理顶部的垃圾行,您可以在阅读 csv 时使用skiprows Below is sample code that might help you achieve the above,下面是可以帮助您实现上述目标的示例代码,

Read csv,读取 csv,

df = pd.read_csv('/tmp/test.csv', na_values=['null'], keep_default_na=True, skiprows=3)

Although i believe null is taken by default as na value but you can use the above to be safe.虽然我相信 null 默认采用 na 值,但您可以使用上述内容来确保安全。

Then you can simple drop the na rows based on a column,然后你可以简单地删除基于列的 na 行,

df.drop_na(subset=column_name)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Pandas:获取列与特定值匹配的行的索引 - Python Pandas: Get index of rows where column matches certain value Python PANDAS:首次出现列值后删除所有行 - Python PANDAS: Drop All Rows After First Occurrence of Column Value 我需要根据条件删除所有行,但如果列中有空条目,我想保留这些行 - I need to drop all rows based on condition but if there are null entries in the column, I want to keep those rows pandas:如果组的最后一行具有特定的列值,如何删除组的所有行 - pandas: how to drop all rows of a group if the last row of the group has certain column value Python如何删除某列中值为NaN的Pandas DataFrame行 - Python How to drop rows of Pandas DataFrame whose value in a certain column is NaN 如何删除Pandas DataFrame某列值为NaN的行 - How to drop rows of Pandas DataFrame whose value in a certain column is NaN 删除所有列相同的重复行,除了熊猫中的一个 - Drop duplicated rows where all column are same except one in pandas 如果一列中的值超出某个值,则删除2列中的所有行 - drop all rows in 2 columns if value in one column is beyond a certain value Pandas DataFrame 按列值组合行,其中日期行是 NULL - Pandas DataFrame combine rows by column value, where Date Rows are NULL Pandas:删除任何列包含某个子字符串的所有行 - Pandas: Remove all rows where any of the column contains a certain substring
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM