简体   繁体   English

Pandas append() 并删除重复项会弄乱索引

[英]Pandas append() and remove duplicates messes up the index

I am scraping some data from a basketball site and the plan is to automatically update it when new data is added.我正在从篮球网站上抓取一些数据,计划是在添加新数据时自动更新它。

I get the data我得到数据

stats = pd.read_html('URL', header=[0, 1])
player_stats = stats[4]

player_stats.to_csv('stats.csv')

Append it Append 它

with open('stats.csv', 'a') as f:
    player_stats.to_csv(f, header=False)

Remove duplicates (method 1)删除重复项(方法 1)

old_data = pd.read_csv('stats.csv')
data = old_data.drop_duplicates(subset='Unnamed: 1_level_0', keep='last')
data.to_csv('stats.csv')

Remove duplicates (method 2)删除重复项(方法 2)

old_data = pd.read_csv('stats.csv')
bool_series = data["Unnamed: 1_level_0"].duplicated(keep='last') 
bool_series 
data = data[~bool_series] 
data.to_csv('stats.csv')

The problem I face is that after the original data is appended with the new data the remove duplicate method messes up the structure of the file making future appending and removing duplicates impossible as duplicates are not recognized as such anymore...我面临的问题是,在原始数据附加新数据后,删除重复方法会弄乱文件的结构,从而使将来无法附加和删除重复项,因为不再能够识别重复项...

Why is the new index added and how do I fix that?为什么要添加新索引,我该如何解决?

Instead of appending data directly to the file, make use of Panda's function concat() .不要将数据直接附加到文件中,而是使用Panda 的 function concat() Be aware of argument axis.注意参数轴。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM