[英]Remove rows containing blank space in python data frame
I imported a csv file to Python (Using Python data frame) and there are some missing values in a CSV file. 我将一个csv文件导入到Python(使用Python数据框),并且CSV文件中缺少一些值。 In the data frame I have rows like following 在数据框中,我有如下行
> 08,63.40,86.21,63.12,72.78,,
I have tried everything to remove the rows containing the elements similar to the last element in the above data. 我已经尽一切努力删除包含与上述数据中的最后一个元素相似的元素的行。 Nothing works. 什么都没有。 I do not know if above is categorized as white space or empty string or what. 我不知道上面是否被归类为空格或空字符串或什么。
Here is what I have: 这是我所拥有的:
result = pandas.read_csv(file,sep='delimiter')
result[result!=',,']
This did not work. 这没有用。 Then I have done following: 然后我做了以下工作:
result.replace(' ', np.nan, inplace=True)
result.dropna(inplace=True)
This also did not work. 这也没有用。
result = result.replace(r'\s+', np.nan, regex=True)
This also did not work. 这也没有用。 I still see the row containing the ,, element. 我仍然看到包含元素的行。
Also my dataframe is 100 by 1. When I import it from CSV file all the columns become 1.( I do not know if this helps) 另外我的数据帧是100 x1。当我从CSV文件导入它时,所有列都变成1。(我不知道这是否有帮助)
Can anyone tell me how to remove rows containing ,, elements? 谁能告诉我如何删除包含元素的行?
Also my dataframe is 100 by 1. When I import it from CSV file all the columns become 1 另外我的数据框是100 x1。当我从CSV文件导入它时,所有列都变成1
This is probably the key and IMHO is weird. 这可能是关键,恕我直言很奇怪。 When you import a csv in a pandas DataFrame you normally want each field to go in its own column, precisely to later be able to process that column values individually. 当您在pandas DataFrame中导入csv时,通常希望每个字段都进入其自己的列中,以便以后能够单独处理该列值。 So (still IMHO) the correct solution if to fix that. 所以(仍然是恕我直言)正确的解决方案,如果要解决此问题。
Now to directly answer your (probably XY question), you do not want to remove rows containing blank or empty columns, because your row only contains one single column, but rows containing consecutive commas( ,,
). 现在直接回答您的问题(可能是XY问题),您不想删除包含空白或空列的行,因为您的行仅包含一个单独的列,但包含连续的逗号( ,,
)的行。 So you should use: 因此,您应该使用:
df.drop(df.iloc[0].str.contains(',,').index)
I think your code should work with a minor change: 我认为您的代码应该稍作改动:
result.replace('', np.nan, inplace=True)
result.dropna(inplace=True)
In case you have several rows in your CSV file, you can avoid the extra conversion step to NaN: 如果CSV文件中有几行,则可以避免额外的转换为NaN的步骤:
result = pandas.read_csv(file)
result = result[result.notnull().all(axis = 1)]
This will remove any row where there is an empty element. 这将删除任何有空元素的行。
However, your added comment explains that there is just one row in the CSV file, and it seems that the CSV reader shows some special behavior. 但是,您添加的注释说明CSV文件中只有一行,而且CSV阅读器似乎显示了某些特殊行为。 Since you need to select the columns without NaN, I suggest these lines: 由于您需要选择不含NaN的列,因此建议以下行:
result = pandas.read_csv(file, header = None)
selected_columns = result.columns[result.notnull().any()]
result = result[selected_columns]
Note the option header = None
with read_csv
. 注意read_csv
的选项header = None
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.