[英]how to fix badly formatted Dataframe
让我们假设我们有以下数据集Ram Price
我已经使用以下命令阅读了这个数据集
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
data = pd.read_csv('https://raw.githubusercontent.com/amueller/introduction_to_ml_with_python/master/data/ram_price.csv')
但是当我使用命令显示前几个项目时
print(data.head())
它向我展示了以下结果
Unnamed: 0 date price
0 0 1957.0 411041792.0
1 1 1959.0 67947725.0
2 2 1960.0 5242880.0
3 3 1965.0 2642412.0
4 4 1970.0 734003.0
请帮我解决这个问题? 当我尝试删除 Unnamed 时,它显示没有 Unnamed 列,如何解决?
它看起来像索引列,您可以选择使用整数索引设置索引列,如下所示:
df = pd.read_csv(
'https://raw.githubusercontent.com/amueller/introduction_to_ml_with_python/master/data/ram_price.csv'
,index_col=[0])
print(df.head(5))
date price
0 1957.0 411041792.0
1 1959.0 67947725.0
2 1960.0 5242880.0
3 1965.0 2642412.0
4 1970.0 734003.0
您需要删除具有全名的列。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
data = pd.read_csv('https://raw.githubusercontent.com/amueller/introduction_to_ml_with_python/master/data/ram_price.csv')
print(data.columns) #print all the columns in the dataframe
#Index(['Unnamed: 0', 'date', 'price'], dtype='object')
data = data.drop(['Unnamed: 0'], axis =1) #axis=` specifies to drop column
print(data.head())
# date price
#0 1957.0 411041792.0
#1 1959.0 67947725.0
#2 1960.0 5242880.0
#3 1965.0 2642412.0
#4 1970.0 734003.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.