入门机器学习时,一些测试数据是网络上的csv文件。这里总结了两种加载csv文件的方式:
1 通过numpy、urllib2加载
import numpy as np
import urllib2
url =
"http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
raw_data =
urllib2.urlopen(url)
dataset = np.loadtxt(raw_data, delimiter=
",")
X = dataset[:, 0:7
]
y = dataset[:, 8]
2 通过pandas加载
import pandas as pd
url =
"http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
dataFrame = pd.read_csv(url, header=
None)
dataset =
dataFrame.values
X = dataset[:, 0:7
]
y = dataset[:, 8]
3 总结
np.loadtxt返回的数据类型是:numpy.ndarraypd.read_csv返回的数据类型是:pandas.core.frame.DataFrameDataFrame.values的类型是:numpy.ndarray所以,本质上,两种方法最后是一样的
转载于:https://www.cnblogs.com/zc9527/p/6286621.html