简体   繁体   English

如何根据空白列删除 CSV 文件中的行

[英]How to delete rows in a CSV file based on blank columns

I have a csv file that is in this format, but has thousands of rows so I can summarize it like this我有一个这种格式的 csv 文件,但有数千行,所以我可以这样总结

id,name,score1,score2,score3
1,,3.0,4.5,2.0
2,,,,
3,,4.5,3.2,4.1

I have tried to use.dropna() but that is not working.我曾尝试使用.dropna() 但这不起作用。

My desired output is我想要的 output 是

id,name,score1,score2,score3
1,,3.0,4.5,2.0
3,,4.5,3.2,4.1

All I would really need is to check if score1 is empty because if score1 is empty then the rest of the scores are empty as well.我真正需要的是检查 score1 是否为空,因为如果 score1 为空,那么分数的 rest 也为空。

I have also tried this but it doesn't seem to do anything.我也试过这个,但它似乎没有做任何事情。

import pandas as pd

df = pd.read_csv('dataset.csv')

df.drop(df.index[(df["score1] == '')], axis=0,inplace=True)

df.to_csv('new.csv')

Can anyone help with this?有人能帮忙吗?

import pandas as pd


df = pd.DataFrame([[1,3.0,4.5,2.0],[2],[3,4.5,3.2,4.1]], columns=["id","score1","score2","score3"])

aux1 = df.dropna()
aux2 = df.dropna(axis='columns')
aux3 = df.dropna(axis='rows')

print('=== original ===')
print(df)
print()
print('=== mode 1 ===')
print(aux1)
print()
print('=== mode 2 ===')
print(aux2)
print()
print('=== mode 3 ===')
print(aux3)
print()
print('=== mode 4 ===')
print('drop original')
df.dropna(axis=1,inplace=True)
print(df)

After seeing your edits, I realized that dropna doesn't work for you because you have a None value in all of your rows.看到您的编辑后,我意识到dropna对您不起作用,因为您在所有行中都有一个None值。 To filter for nan values in a specific column, I would recommend using the apply function like in the following code.要过滤特定列中的nan值,我建议使用apply function,如下面的代码所示。 (Btw the StackOverflow.csv is just a file where I copied and pasted your data from the question) (顺便说一句, StackOverflow.csv只是我从问题中复制并粘贴您的数据的文件)

import pandas as pd
import math

df = pd.read_csv("StackOverflow.csv", index_col="id")

#Function that takes a number and returns if its nan or not
def not_nan(number):
    return not math.isnan(number)

#Filtering the dataframe with the function
df = df[df["score1"].apply(not_nan)]

What this does is iterate through the score1 row and check if a value is NaN or not.这样做是遍历score1行并检查值是否为NaN If it is, then it returns False.如果是,则返回 False。 We then use the list of True and False values to filter out the values from the dataframe.然后,我们使用TrueFalse值列表从 dataframe 中过滤掉值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM