python逻辑回归函数,Python 回归

python怎么实现逻辑回归的梯度下降法

import sys

洪洞网站制作公司哪家好，找创新互联建站！从网页设计、网站建设、微信开发、APP开发、成都响应式网站建设等网站项目制作，到程序开发，运营维护。创新互联建站自2013年起到现在10年的时间，我们拥有了丰富的建站经验和运维经验，来保证我们的工作的顺利进行。专注于网站建设就选创新互联建站。

#Training data set

#each element in x represents (x0,x1,x2)

x = [(1,0.,3) , (1,1.,3) ,(1,2.,3), (1,3.,2) , (1,4.,4)]

#y[i] is the output of y = theta0 * x[0] + theta1 * x[1] +theta2 * x[2]

y = [95.364,97.217205,75.195834,60.105519,49.342380]

epsilon = 0.0001

#learning rate

alpha = 0.01

diff = [0,0]

max_itor = 1000

error1 = 0

error0 =0

cnt = 0

m = len(x)

#init the parameters to zero

theta0 = 0

theta1 = 0

theta2 = 0

while True:

cnt = cnt + 1

#calculate the parameters

for i in range(m):

diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] )

theta0 = theta0 + alpha * diff[0] * x[i][0]

theta1 = theta1 + alpha * diff[0]* x[i][1]

theta2 = theta2 + alpha * diff[0]* x[i][2]

#calculate the cost function

error1 = 0

for lp in range(len(x)):

error1 += ( y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) )**2/2

if abs(error1-error0) epsilon:

break

else:

error0 = error1

print ' theta0 : %f, theta1 : %f, theta2 : %f, error1 : %f'%(theta0,theta1,theta2,error1)

print 'Done: theta0 : %f, theta1 : %f, theta2 : %f'%(theta0,theta1,theta2)

python做逻辑回归怎么把导入的数据分成x,y

简介

本例子是通过对一组逻辑回归映射进行输出，使得网络的权重和偏置达到最理想状态，最后再进行预测。其中，使用GD算法对参数进行更新，损耗函数采取交叉商来表示，一共训练10000次。

2.python代码

#!/usr/bin/python

import numpy

import theano

import theano.tensor as T

rng=numpy.random

N=400

feats=784

# D[0]:generate rand numbers of size N,element between (0,1)

# D[1]:generate rand int number of size N,0 or 1

D=(rng.randn(N,feats),rng.randint(size=N,low=0,high=2))

training_steps=10000

# declare symbolic variables

x=T.matrix('x')

y=T.vector('y')

w=theano.shared(rng.randn(feats),name='w') # w is shared for every input

b=theano.shared(0.,name='b') # b is shared too.

print('Initial model:')

print(w.get_value())

print(b.get_value())

# construct theano expressions,symbolic

p_1=1/(1+T.exp(-T.dot(x,w)-b)) # sigmoid function,probability of target being 1

prediction=p_10.5

xent=-y*T.log(p_1)-(1-y)*T.log(1-p_1) # cross entropy

cost=xent.mean()+0.01*(w**2).sum() # cost function to update parameters

gw,gb=T.grad(cost,[w,b]) # stochastic gradient descending algorithm

#compile

train=theano.function(inputs=[x,y],outputs=[prediction,xent],updates=((w,w-0.1*gw),(b,b-0.1*gb)))

predict=theano.function(inputs=[x],outputs=prediction)

# train

for i in range(training_steps):

pred,err=train(D[0],D[1])

print('Final model:')

print(w.get_value())

print(b.get_value())

print('target values for D:')

print(D[1])

print('prediction on D:')

print(predict(D[0]))

print('newly generated data for test:')

test_input=rng.randn(30,feats)

print('result:')

print(predict(test_input))

3.程序解读

如上面所示，首先导入所需的库，theano是一个用于科学计算的库。然后这里我们随机产生一个输入矩阵，大小为400*784的随机数，随机产生一个输出向量大小为400，输出向量为二值的。因此，称为逻辑回归。

然后初始化权重和偏置，它们均为共享变量(shared)，其中权重初始化为较小的数，偏置初始化为0，并且打印它们。

这里我们只构建一层网络结构，使用的激活函数为logistic sigmoid function，对输入量乘以权重并考虑偏置以后就可以算出输入的激活值，该值在(0,1)之间，以0.5为界限进行二值化，然后算出交叉商和损耗函数，其中交叉商是代表了我们的激活值与实际理论值的偏离程度。接着我们使用cost分别对w,b进行求解偏导，以上均为符号表达式运算。

接着我们使用theano.function进行编译优化，提高计算效率。得到train函数和predict函数，分别进行训练和预测。

接着，我们对数据进行10000次的训练，每次训练都会按照GD算法进行更新参数，最后我们得到了想要的模型，产生一组新的输入，即可进行预测。

在逻辑回归中，odds ratio怎么用python计算？

实际上完成逻辑回归是相当简单的，首先指定要预测变量的列，接着指定模型用于做预测的列，剩下的就由算法包去完成了。

本例中要预测的是admin列，使用到gre、gpa和虚拟变量prestige_2、prestige_3、prestige_4。prestige_1作为基准，所以排除掉，以防止多元共线性(multicollinearity)和引入分类变量的所有虚拟变量值所导致的陷阱(dummy variable trap)。

程序缩进如图所示

怎么看python中逻辑回归输出的解释

以下为python代码，由于训练数据比较少，这边使用了批处理梯度下降法，没有使用增量梯度下降法。

##author:lijiayan##data:2016/10/27

##name:logReg.pyfrom numpy import *import matplotlib.pyplot as pltdef loadData(filename):

data = loadtxt(filename)

m,n = data.shape print 'the number of examples:',m print 'the number of features:',n-1 x = data[:,0:n-1]

y = data[:,n-1:n] return x,y#the sigmoid functiondef sigmoid(z): return 1.0 / (1 + exp(-z))#the cost functiondef costfunction(y,h):

y = array(y)

h = array(h)

J = sum(y*log(h))+sum((1-y)*log(1-h)) return J# the batch gradient descent algrithmdef gradescent(x,y):

m,n = shape(x) #m: number of training example; n: number of features x = c_[ones(m),x] #add x0 x = mat(x) # to matrix y = mat(y)

a = 0.0000025 # learning rate maxcycle = 4000 theta = zeros((n+1,1)) #initial theta J = [] for i in range(maxcycle):

h = sigmoid(x*theta)

theta = theta + a * (x.T)*(y-h)

cost = costfunction(y,h)

J.append(cost)

plt.plot(J)

plt.show() return theta,cost#the stochastic gradient descent (m should be large,if you want the result is good)def stocGraddescent(x,y):

m,n = shape(x) #m: number of training example; n: number of features x = c_[ones(m),x] #add x0 x = mat(x) # to matrix y = mat(y)

a = 0.01 # learning rate theta = ones((n+1,1)) #initial theta J = [] for i in range(m):

h = sigmoid(x[i]*theta)

theta = theta + a * x[i].transpose()*(y[i]-h)

cost = costfunction(y,h)

J.append(cost)

plt.plot(J)

plt.show() return theta,cost#plot the decision boundarydef plotbestfit(x,y,theta):

plt.plot(x[:,0:1][where(y==1)],x[:,1:2][where(y==1)],'ro')

plt.plot(x[:,0:1][where(y!=1)],x[:,1:2][where(y!=1)],'bx')

x1= arange(-4,4,0.1)

x2 =(-float(theta[0])-float(theta[1])*x1) /float(theta[2])

plt.plot(x1,x2)

plt.xlabel('x1')

plt.ylabel(('x2'))

plt.show()def classifyVector(inX,theta):

prob = sigmoid((inX*theta).sum(1)) return where(prob = 0.5, 1, 0)def accuracy(x, y, theta):

m = shape(y)[0]

x = c_[ones(m),x]

y_p = classifyVector(x,theta)

accuracy = sum(y_p==y)/float(m) return accuracy

调用上面代码：

from logReg import *

x,y = loadData("horseColicTraining.txt")

theta,cost = gradescent(x,y)print 'J:',cost

ac_train = accuracy(x, y, theta)print 'accuracy of the training examples:', ac_train

x_test,y_test = loadData('horseColicTest.txt')

ac_test = accuracy(x_test, y_test, theta)print 'accuracy of the test examples:', ac_test

学习速率=0.0000025，迭代次数=4000时的结果：

似然函数走势（J = sum(y*log(h))+sum((1-y)*log(1-h))），似然函数是求最大值，一般是要稳定了才算最好。

下图为计算结果，可以看到训练集的准确率为73%，测试集的准确率为78%。

这个时候，我去看了一下数据集，发现没个特征的数量级不一致，于是我想到要进行归一化处理：

归一化处理句修改列loadData(filename)函数：

def loadData(filename):

data = loadtxt(filename)

m,n = data.shape print 'the number of examples:',m print 'the number of features:',n-1 x = data[:,0:n-1]

max = x.max(0)

min = x.min(0)

x = (x - min)/((max-min)*1.0) #scaling y = data[:,n-1:n] return x,y

在没有归一化的时候，我的学习速率取了0.0000025（加大就会震荡，因为有些特征的值很大，学习速率取的稍大，波动就很大），由于学习速率小，迭代了4000次也没有完全稳定。现在当把特征归一化后（所有特征的值都在0~1之间），这样学习速率可以加大，迭代次数就可以大大减少，以下是学习速率=0.005，迭代次数=500的结果：

此时的训练集的准确率为72%，测试集的准确率为73%

从上面这个例子，我们可以看到对特征进行归一化操作的重要性。

分享文章：python逻辑回归函数,Python 回归
分享网址：http://hbruida.cn/article/dseepci.html

python逻辑回归函数,Python 回归

python怎么实现逻辑回归的梯度下降法

python做逻辑回归 怎么把导入的数据分成x,y

在逻辑回归中，odds ratio怎么用python计算？

怎么看python中逻辑回归输出的解释

其他资讯

python做逻辑回归怎么把导入的数据分成x,y