简体   繁体   中英

Python Pandas 'Unnamed' column keeps appearing

I am running into an issue where each time I run my program (which reads the dataframe from a .csv file) a new column shows up called 'Unnamed'.

sample output columns after running 3 times -

  Unnamed: 0  Unnamed: 0.1            Subreddit  Appearances

here is my code. for each row, the 'Unnamed' columns simply increase by 1.

df = pd.read_csv(Location)
while counter < 50:
    #gets just the subreddit name
    e = str(elem[counter].get_attribute("href"))
    e = e.replace("https://www.reddit.com/r/", "")
    e = e[:-1]
    if e in df['Subreddit'].values:
        #adds 1 to Appearances if the subreddit is already in the DF
        df.loc[df['Subreddit'] == e, 'Appearances'] += 1
    else:
        #adds new row with the subreddit name and sets the amount of appearances to 1.
        df = df.append({'Subreddit': e, 'Appearances': 1}, ignore_index=True)
    df.reset_index(inplace=True, drop=True)
    print(e)
    counter = counter + 2
#(doesn't work) df.drop(df.columns[df.columns.str.contains('Unnamed', case=False)], axis=1)

The first time i run it, with a clean .csv file, it works perfect, but each time after, another 'Unnamed' column shoes up. I just wanted the 'Subreddit' and 'Appearances' columns to show each time.

另一种解决方案是读取属性为index_col=0 csv,而不考虑索引列: df = pd.read_csv(Location, index_col=0)

each time I run my program (...) a new column shows up called 'Unnamed'.

I suppose that's due to reset_index or maybe you have a to_csv somewhere in your code as @jpp suggested. To fix the to_csv be sure to use index=False :

df.to_csv(path, index=False)

In general, here's how I would approach your task. What this does is to count all appearances first (keyed by e ), and from these counts create a new dataframe to merge with the one you already have ( how='outer' adds rows that don't exist yet). This avoids resetting the index for each element which should avoid the problem and is also more performant.

Here's the code with these thoughts included:

base_df = pd.read_csv(location)
appearances = Counter()  # from collections
while counter < 50:
    #gets just the subreddit name
    e = str(elem[counter].get_attribute("href"))
    e = e.replace("https://www.reddit.com/r/", "")
    e = e[:-1]
    appearances[e] += 1
    counter = counter + 2
appearances_df = pd.DataFrame({'e': e, 'appearances': c } 
                               for e, c in x.items())
df = base_df.merge(appearances_df, how='outer', on='e')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM