无法根据条件从 dataframe 中删除行

Question

So i have a dataframe, df:所以我有一个 dataframe，df：

        Rank                                              Name Platform  ...  JP_Sales Other_Sales Global_Sales     
0          1                                        Wii Sports      Wii  ...      3.77        8.46        82.74     
1          2                                 Super Mario Bros.      NES  ...      6.81        0.77        40.24     
2          3                                    Mario Kart Wii      Wii  ...      3.79        3.31        35.82     
3          4                                 Wii Sports Resort      Wii  ...      3.28        2.96        33.00     
4          5                          Pokemon Red/Pokemon Blue       GB  ...     10.22        1.00        31.37     
...      ...                                               ...      ...  ...       ...         ...          ...     
16593  16596                Woody Woodpecker in Crazy Castle 5      GBA  ...      0.00        0.00         0.01     
16594  16597                     Men in Black II: Alien Escape       GC  ...      0.00        0.00         0.01     
16595  16598  SCORE International Baja 1000: The Official Game      PS2  ...      0.00        0.00         0.01     
16596  16599                                        Know How 2       DS  ...      0.00        0.00         0.01     
16597  16600                                  Spirits & Spells      GBA  ...      0.00        0.00         0.01

I used df.describe and it shows that the year count is less than the others:我使用df.describe ，它显示年份计数少于其他年份：

So i thought that some values in Year are empty.所以我认为 Year 中的一些值是空的。 tried doing df.dropna() but that didnt work.尝试做df.dropna()但这没有用。

I then tried printing the values of the column Year which were not numbers with this code (Probably not the best code but it works) along with the type() :然后，我尝试使用此代码（可能不是最好的代码，但它有效）与type()一起打印不是数字的 Year 列的值：

with open("vgsales.csv", "r") as csv_file:
    rows = csv_file.read().split("\n")
    row_components = [row.split(",") for row in rows if len(row) > 0]

    data_dict = {header:[] for header in row_components[0]}

    for header_index, header in enumerate(row_components[0]):
        print("header_index: ", header_index)
        for row_index, row in enumerate(row_components[1:]):
            data_dict[header].append(row[header_index])
    
    for i in data_dict["Year"]:
        if not i.isdigit():
            print(i, type(i))

The output (same output repeated a lot): output（同样的output重复了很多）：

N/A <class 'str'>

So then i tried the answers i found in this stackoverflow question: df = df[df.Year != "N/A"] and it didnt work either所以然后我尝试了我在这个stackoverflow问题中找到的答案： df = df[df.Year != "N/A"]它也没有工作

Also tried df = df.drop(df[(df.Year == "N/A")].index) and it didnt work还尝试df = df.drop(df[(df.Year == "N/A")].index)但它没有用

So then i thought Why dont i open it in excel and see what values are there when it is not a year.所以然后我想为什么我不在 excel 中打开它，看看不是一年时有什么值。 Indeed it was N/A确实是N/A

Any ideas what i can do?有什么想法我能做什么？ I want to clean the data so that all the columns have the same count for a machine learning project我想清理数据，以便机器学习项目的所有列具有相同的计数

Answer 1

First off, it's important to know why you're missing data, and to see if you can possibly impute rather than just drop.首先，重要的是要知道为什么你会丢失数据，并看看你是否可以估算而不是仅仅放弃。

If you still want to drop, you can use df = df.dropna(how='any') .如果您仍想放弃，可以使用df = df.dropna(how='any') 。

The reason why Excel shows "N/A" as the value for missing data is because that's Excel's way of showing missing data. Excel 显示“N/A”作为缺失数据的值的原因是因为这是 Excel 显示缺失数据的方式。 It doesn't mean that the value of the cell that is missing data is N/A --that would be a string containing an N, a slash, and an A. Instead, you can try df = df[~df['Year'].isnull()] as an alternative method for selecting non-null values.这并不意味着缺少数据的单元格的值是N/A A——这将是一个包含 N、斜杠和 A 的字符串。相反，您可以尝试df = df[~df['Year'].isnull()]作为选择非空值的替代方法。

无法根据条件从 dataframe 中删除行

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-05-11 22:42:05

无法根据条件从 dataframe 中删除行

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-05-11 22:42:05

解决方案1
2 已采纳 2021-05-11 22:42:05