[英]Convert a list of data from url to csv in python
我正在嘗試將此乳腺癌威斯康星州數據集從列表轉換為帶有列的數據框。
這是數據集: http : //archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data
這些是列名:
# Attribute Domain
-- -----------------------------------------
1. Sample code number id number
2. Clump Thickness 1 - 10
3. Uniformity of Cell Size 1 - 10
4. Uniformity of Cell Shape 1 - 10
5. Marginal Adhesion 1 - 10
6. Single Epithelial Cell Size 1 - 10
7. Bare Nuclei 1 - 10
8. Bland Chromatin 1 - 10
9. Normal Nucleoli 1 - 10
10. Mitoses 1 - 10
11. Class: (2 for benign, 4 for malignant)
我像這樣將數據集導入python
匯入要求
link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
f = requests.get(link)
print (f.text)
並以逗號列表的形式查看數據:
1000025,5,1,1,1,2,1,3,1,1,2
1002945,5,4,4,5,7,10,3,2,1,2
1015425,3,1,1,1,2,2,3,1,1,2
1016277,6,8,8,1,3,4,3,7,1,2
1017023,4,1,1,3,2,1,3,1,1,2
我需要將逗號分隔為各列,並在各列中添加名稱
我嘗試了這個,但是沒有用
import requests
import pandas as pd
import io
urlData = requests.get(f.text).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
這將達到目的
import requests
import os
csvFile = open('c:\\users\\user\\desktop\\data.csv','w')
headers = 'sample','Clump Thickness','niformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion','Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class'
r = requests.get("http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data").text
csvFile.write(str(headers).replace("'",'').replace('(','').replace(')','') + "\n")
csvFile.write(r)
csvFile.close()
以下為我工作:
import pandas as pd
import requests
link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
f = requests.get(link)
# separate each line
newf = f.text.splitlines()
# create pandas dataframe
df = pd.DataFrame([x.split(",") for x in newf])
import requests
import pandas as pd
import io
names = ['Sample code number',
'Clump Thickness',
'Uniformity of Cell Size',
'Uniformity of Cell Shape',
'Marginal Adhesion',
'Single Epithelial Cell Size',
'Bare Nuclei',
'Bland Chromatin',
'Normal Nucleoli',
'Mitoses',
'Class']
link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
csv_text = requests.get(link).text
# if you don't care about column names omit names=names and do headers=None instead
df = pd.read_csv(io.StringIO(csv_text), names=names)
我肯定會想到一種更好的方法,但是....我已將輸出發送到帶有靜態標題行的csv。由於數據已被“,”定界,所以我認為這是最簡單的方法。
import requests
import io
def main():
outputFile = 'someName.csv'
link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
f = requests.get(link)
headerLine = ("Sample code number(id number),Clump Thickness(1 - 10),Uniformity of Cell Size(1 - 10),Uniformity of Cell Shape(1 - 10),Marginal Adhesion(1 - 10),Single Epithelial Cell Size(1 - 10),Bare Nuclei(1 - 10),Bland Chromatin(1 - 10),Normal Nucleoli(1 - 10),Mitoses(1 - 10),Class:(2 for benign - 4 for malignant)")
data =(f.text)
try:
with open(outputFile, "w+") as ofile:
ofile.write(headerLine + '\n')
ofile.write(data)
print("Success")
except Exception as e:
raise e
if __name__ == '__main__':
main()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.