基于Python實(shí)現(xiàn)虛假評(píng)論檢測(cè)可視化系統(tǒng)
主要代碼是參考:https://github.com/SoulDGXu/NLPVisualizationSystem/tree/master/frontend
他這個(gè)代碼實(shí)現(xiàn)了詞云、摘要生成等功能吧。因?yàn)槲易龅氖翘摷僭u(píng)論檢測(cè)系統(tǒng),就沒(méi)有使用他這個(gè)里面的功能,參考了他的思路和使用 了他的前端界面。
前端是Bootstrap框架完成的,后端是用的Flask和tensorflow框架。tensorflow框架就是自己算法的主體啦。這里的算法是BERT-whitening+LR實(shí)現(xiàn)的,準(zhǔn)確率也可以的。通過(guò)LR_xitong()進(jìn)行的調(diào)用。
主要的功能有:登錄注冊(cè)、單條文本檢測(cè)、批量文本檢測(cè)、網(wǎng)頁(yè)評(píng)論爬取。
還是有不足的地方,例如爬取只爬取了一頁(yè)的內(nèi)容。
1.app.py
這個(gè)代碼就是Flask的整個(gè)邏輯實(shí)現(xiàn)的地方啦,通過(guò)路由規(guī)則到達(dá)指定的頁(yè)面,然后通過(guò)get方式得到頁(yè)面輸入的內(nèi)容,通過(guò)post方式返回內(nèi)容給前端頁(yè)面。
# -*- coding: utf-8 -*-
"""
服務(wù):
-自動(dòng)生成詞云圖:
1. 根據(jù)用戶(hù)輸入指定網(wǎng)址,通過(guò)采集該網(wǎng)址文本進(jìn)行處理。
2. 根據(jù)用戶(hù)輸入文本字符串進(jìn)行處理。
3. 根據(jù)用戶(hù)輸入載入本地文本進(jìn)行處理,用戶(hù)將所需要處理文本文件放入text文本夾中,指定文件名進(jìn)行處理。
-文本關(guān)鍵信息提取
-文本情感分析
-用戶(hù)評(píng)價(jià)分析
-用戶(hù)畫(huà)像
后臺(tái)設(shè)計(jì):
1. 服務(wù)接口設(shè)計(jì)
1.1 頁(yè)面請(qǐng)求設(shè)計(jì)
1.2 數(shù)據(jù)請(qǐng)求設(shè)計(jì)
2. 異常請(qǐng)求設(shè)計(jì)
"""
import os
from src import config
from src.exe import LR_xitong
from src.exe import file
from src.exe import yelp_claw
from flask import Flask, render_template,send_from_directory
from flask import Flask, render_template, request, redirect, url_for
from flask import request, redirect, json, url_for
from werkzeug.utils import secure_filename
import requests
import json
from flask_sqlalchemy import SQLAlchemy
from sqlalchemy import and_
# from src.exe import exe_02
# from src.exe import exe_03
# from src.exe import exe_05
# from src.exe import exe_06
# from src.exe import exe_01, exe_02, exe_03, exe_05, exe_06
## =================================== 路由配置 ===================================
##############################################################################################
print(LR_xitong.predict_review())
## Part 1 ++++++++++++++++++++++++++++++++++++++++++++++++++++
#==================================================================
#登錄,連接數(shù)據(jù)庫(kù)
app = Flask(__name__, template_folder=config.template_dir,static_folder=config.static_dir)
HOSTNAME = "127.0.0.1"
PORT = 3306
USERNAME = "root"
PASSWORD = "root"
DATABASE = "database_learn"
app.config[
'SQLALCHEMY_DATABASE_URI'] = \
f"mysql+pymysql://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}/{DATABASE}?charset=utf8mb4"
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = True
db = SQLAlchemy(app)
@app.route("/")
def index():
return render_template("register.html")
class User(db.Model):
__tablename__ = 'user_list1' #(設(shè)置表名)
id = db.Column(db.Integer, primary_key=True) #(設(shè)置主鍵)
username = db.Column(db.String(255), unique=True)
password = db.Column(db.String(255), unique=True)
# 返回一個(gè)可以用來(lái)表示對(duì)象的可打印字符串:(相當(dāng)于java的toString)
def __repr__(self):
return '<User 用戶(hù)名:%r 密碼:%r>' % (self.username, self.password)# 操作數(shù)據(jù)庫(kù)
#增
def add_object(user):
db.session.add(user)
db.session.commit()
print("添加 % r 完成" % user.__repr__)
with app.app_context():
user = User()
user = db.session.merge(user) # 將未綁定的實(shí)例或?qū)ο蠛喜⒌綍?huì)話中
# user.username = 'li三'
# user.password = '123456'
# add_object(user)
# 查 (用到and的時(shí)候需要導(dǎo)入庫(kù)from sqlalchemy import and_)
# def query_object(user, query_condition_u, query_condition_p):
# result = user.query.filter(and_(user.username == query_condition_u, user.password == query_condition_p))
# print("查詢(xún) % r 完成" % user.__repr__)
# return result
# 刪
# def delete_object(user):
# result = user.query.filter(user.username == '11111').all()
# db.session.delete(result)
# db.session.commit()
# #改
# def update_object(user):
# result = user.query.filter(user.username == '111111').all()
# result.title = 'success2018'
@app.route("/login",methods=['POST'])
def login():
username1=request.form.get("username")
password1 = request.form.get("password")
if user.query.filter_by(username =username1,password =password1).all()!=[]:
# print(user.username,username1,user.password,password1)
print("登錄成功")
return render_template("text_classification1.html")
else:
print("失敗")
print(username1,password1)
return render_template("register.html")
#===========================================================
#注冊(cè):
@app.route("/register",methods=['POST'])
def register():
username1=request.form.get("username")
password1 = request.form.get("password")
#判斷是否在表中,如果不在,則增加,如果在,則返回已經(jīng)存在的錯(cuò)誤提示
if user.query.filter_by(username=username1, password=password1).all() == []:
user.username = username1
user.password = password1
add_object(user)
return render_template("login.html")
else:
print("已經(jīng)注冊(cè)過(guò)了")
message="已經(jīng)注冊(cè)過(guò)了"
return render_template("register.html",message=message)
## Part 2 自動(dòng)生成詞云圖 ++++++++++++++++++++++++++++++++++++++++++++++++++++
def read_file(filepath):
"""
Read the local file and transform to text.
Parameters
----------
filepath : TYPE-str
DESCRIPTION: the text file path.
Returns
-------
content : TYPE-str
DESCRIPTION:The preprocessed news text.
"""
f = open(filepath,'r',encoding='utf-8')
content = f.read()
f.close()
return content
def save_to_file(filepath, content):
f = open(filepath, 'w', encoding='utf-8')
f.write(content)
f.close()
def check_url(url):
"""
Check if the URL can be accessed normally.
Open a simulated browser and visit.
If the access is normal, the output is normal, and the error is output.
Parameters
----------
url : TYPE-str
DESCRIPTION: the URL.
Returns
-------
content : TYPE-str
DESCRIPTION:The preprocessed news text.
"""
import urllib
import time
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/49.0.2')] #Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0
url = url.replace('\n','').strip()
try:
opener.open(url)
print(url + ' successfully accessed.')
return True
except urllib.error.HTTPError:
print(url + ' = Error when accessing the page.')
time.sleep(2)
except urllib.error.URLError:
print(url + " = Error when accessing the page.")
time.sleep(2)
time.sleep(0.1)
return False
##############################################################################################
##############################################################################################
## Part 3 文本預(yù)處理
## Part 3.2 文本關(guān)鍵信息提取--多文本分析--主題分析
##############################################################################################
## Part 4 文本分類(lèi)
#/classification_1是單文本
#英文
@app.route("/classification_1",methods=['GET'])
def review_classification_home():
return render_template("text_classification1.html")
@app.route("/classification_1",methods=['POST'])
def review_classification_input():
text=request.form.get('inputtext')
text1=text #將輸入的文本儲(chǔ)存到text1中
if not text.isascii(): #如果不是英文
url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
data = {
'i': text,
'from': 'AUTO',
'to': 'AUTO',
'smartresult': 'dict',
'client': 'fanyideskweb',
'salt': '16071715461327',
'sign': 'f5d5d5c129878e8e36558fb321b16f85',
'ts': '1607171546132',
'bv': 'd943a2cf8cbe86fb2d1ff7fcd59a6a8c',
'doctype': 'json',
'version': '2.1',
'keyfrom': 'fanyi.web',
'action': 'FY_BY_REALTlME',
'typoResult': 'false'
}
# 發(fā)送POST請(qǐng)求并獲取響應(yīng)數(shù)據(jù)
response = requests.post(url, data=data)
result = json.loads(response.text)
# 解析翻譯結(jié)果并輸出
translate_result = result['translateResult'][0][0]['tgt']
print("翻譯結(jié)果:", translate_result)
text = translate_result
try:
if text!=None:
save_to_file(config.classificaion_input_text_path,text) #英文文本
save_to_file(config.classificaion_input_text1_path,text1) #輸入的中文文本
print(text)
return redirect('/download_classification')
except:
return render_template("text_classification1.html")
####################################################################################
#####################################################################################
# 文本分類(lèi)結(jié)果
@app.route('/download_classification', methods=['GET'])
def review_classification():
cur = LR_xitong.predict_review()
print("要返回結(jié)果啦")
return render_template("classification.html", curinput=cur)
# 文本分類(lèi)結(jié)果,下載輸出結(jié)果
@app.route('/download_classification', methods=['POST'])
def download_review_classification():
file_dir, filename = os.path.split(config.download_classification_input_text_save_path)
print("要保存啦")
return send_from_directory(file_dir, filename, as_attachment=True)
######################################################################################
#批量文本處理
@app.route("/classification_2",methods=['GET'])
def pilialng():
return render_template("text_classification2.html")
@app.route('/classification_2', methods=['POST'])
def get_import_file():
userfile = request.files.get('loadfile')
if userfile:
filename = secure_filename(userfile.filename)
types = ['xlsx', 'csv', 'xls']
if filename.split('.')[-1] in types:
uploadpath = os.path.join(config.save_dir, filename)
userfile.save(uploadpath)
save_to_file(config.wc_input_file_save_path, uploadpath)
print('文件上傳成功')
return redirect('/download_classification_2')
else:
return render_template("text_classification2.html")
#=============================
#批量文本下載
@app.route('/download_classification_2', methods=['GET'])
def rt_keyinfo_import_file():
filepath=read_file(config.wc_input_file_save_path)
cur = file.predict(filepath) #這里就要把列表的東西返回
return render_template("classification2.html", curinput=cur)
# 03 tab3關(guān)鍵信息生成-下載輸出結(jié)果
@app.route('/download_classification_2', methods=['POST'])
def download_keyinfo_3():
file.save()
return 0
##############################################################################################
#輸入U(xiǎn)RL
@app.route("/classification_3", methods=['GET'])
def keyinfo_home_1():
return render_template("text_classification3.html")
# 01 tab1關(guān)鍵信息提取構(gòu)建-獲取前端輸入數(shù)據(jù)
@app.route('/classification_3', methods=['POST'])
def get_keyinfo_url():
url = request.form.get('texturl')[25:]
try:
save_to_file(config.keyinfo_input_url_path, url)
# if check_url(url):
# save_to_file(config.keyinfo_input_url_path, url)
# print('add URL: ' + url)
return redirect('/download_classification_3')
except:
return render_template("text_classification3.html")
# 01 tab1關(guān)鍵信息生成-數(shù)據(jù)請(qǐng)求
@app.route('/download_classification_3', methods=['GET'])
def rt_keyinfo_url():
res_name=read_file(config.keyinfo_input_url_path) #這是讀的餐廳名字
#然后進(jìn)行爬取,存儲(chǔ)到另一個(gè)路徑
yelp_claw.claw(res_name)
cur = file.predict('yelp_reviews.csv')
return render_template("classification3.html", curinput=cur)
# 01 tab1關(guān)鍵信息生成-下載輸出結(jié)果
@app.route('/download_classification_3', methods=['POST'])
def download_keyinfo_1():
file_dir, filename = os.path.split(config.download_keyinfo_input_url_save_path)
return send_from_directory(file_dir, filename, as_attachment=True)
##############################################################################################
# ############################# 異常處理 ###########################
# 403錯(cuò)誤
@app.errorhandler(403)
def miss(e):
return render_template('error-403.html'), 403
# 404錯(cuò)誤
@app.errorhandler(404)
def error404(e):
return render_template('error-404.html'), 404
# 405錯(cuò)誤
@app.errorhandler(405)
def erro405r(e):
return render_template('error-405.html'), 405
# 500錯(cuò)誤
@app.errorhandler(500)
def error500(e):
return render_template('error-500.html'), 500
# 主函數(shù)
if __name__ == "__main__":
app.run() 2.LR_xitong.py
這部分代碼就是單條文本檢測(cè)的實(shí)現(xiàn)了,先將數(shù)據(jù)集進(jìn)行訓(xùn)練,保存LR模型參數(shù),然后使LR對(duì)新得到的句子向量進(jìn)行判斷。
## 基礎(chǔ)函數(shù)庫(kù)
import numpy as np
## 導(dǎo)入邏輯回歸模型函數(shù)
from sklearn.linear_model import LogisticRegression
import pandas as pd
from sklearn import linear_model
from src.exe import Singlesentence
from Singlesentence import *
import tensorflow as tf
from tensorflow import keras
##Demo演示LogisticRegression分類(lèi)
## 構(gòu)造數(shù)據(jù)集
train_data_features=pd.read_csv(r'D:\BaiduNetdiskDownload\yelp\new\BHAN+W\res.csv') #需要加一行數(shù)組標(biāo)
file_name = r'D:\BaiduNetdiskDownload\yelp\yelp_rzj\label.csv' #鍵入訓(xùn)練數(shù)據(jù)名
label_name = 'label1' #鍵入標(biāo)簽列標(biāo)題
#提取評(píng)論標(biāo)簽
def getLabel():
df_data=pd.read_csv(file_name, encoding='utf-8')
data = list(df_data[label_name])
return data
label = getLabel()
x_fearures = train_data_features
y_label = label
## 調(diào)用邏輯回歸模型
lr_clf = LogisticRegression()
## 用邏輯回歸模型擬合構(gòu)造的數(shù)據(jù)集
lr_clf = lr_clf.fit(x_fearures, y_label)
def predict_review():
x_fearures_new1=[vec()]
##在訓(xùn)練集和測(cè)試集上分布利用訓(xùn)練好的模型進(jìn)行預(yù)測(cè)
y_label_new1_predict=lr_clf.predict(x_fearures_new1)
if y_label_new1_predict[0] == 1:
a='真實(shí)'
else:
a='虛假'
print('The New point 1 predict class:\n',a)
##由于邏輯回歸模型是概率預(yù)測(cè)模型(前文介紹的p = p(y=1|x,\theta)),所有我們可以利用predict_proba函數(shù)預(yù)測(cè)其概率
y_label_new1_predict_proba=lr_clf.predict_proba(x_fearures_new1)
print('The New point 1 predict Probability of each class:\n',y_label_new1_predict_proba)
a1=read_file(config.classificaion_input_text_path) #
b=read_file(config.classificaion_input_text1_path)
if a1==b:
inputtext=a1
else:
inputtext=b
curinput={'inputtext':inputtext,'a':a,'proba':y_label_new1_predict_proba}
return curinput
3.singleSentence.py
這部分就是對(duì)文本通過(guò)BERT-whitening模型進(jìn)行向量化。
#! -*- coding: utf-8 -*-
# 簡(jiǎn)單的線性變換(白化)操作,就可以達(dá)到甚至超過(guò)BERT-flow的效果。
from utils import *
import os, sys
import numpy as np
import xlsxwriter
import re
from src import config
import pandas as pd
import tensorflow as tf
from tensorflow import keras
def save_to_file(filepath, content):
"""
Write the text to the local file.
Parameters
----------
filepath : TYPE-str
DESCRIPTION: the file save path.
Returns
-------
content : TYPE-str
DESCRIPTION: the text.
"""
f = open(filepath, 'w', encoding='utf-8')
f.write(content)
f.close()
def read_file(filepath):
"""
Read the local file and transform to text.
Parameters
----------
filepath : TYPE-str
DESCRIPTION: the text file path.
Returns
-------
content : TYPE-str
DESCRIPTION:The preprocessed news text.
"""
f = open(filepath,'r',encoding='utf-8')
content = f.read()
f.close()
return content
def load_mnli_train_data1(filename):
df = pd.read_csv(filename, encoding='gbk')
# 劃分data與label
data = df['comment_text']
D = []
with open(filename, encoding='gbk') as f:
for i, l in enumerate(f):
if i > 0:
l = l.strip().split(',')
pattern = r'\.|\?|\~|!|。|、|;|‘|'|【|】|·|!|…|(|)'
result_list = re.split(pattern, data[i-1])
for text in result_list:
D.append((text, l[-1]))
return D
def convert_to_ids1(data, tokenizer, maxlen=64):
"""轉(zhuǎn)換文本數(shù)據(jù)為id形式
"""
a_token_ids= []
for d in tqdm(data):
token_ids = tokenizer.encode(d, maxlen=maxlen)[0]
a_token_ids.append(token_ids)
a_token_ids = sequence_padding(a_token_ids)
return a_token_ids
def convert_to_vecs1(data, tokenizer, encoder, maxlen=64):
"""轉(zhuǎn)換文本數(shù)據(jù)為向量形式
"""
a_token_ids = convert_to_ids1(data, tokenizer, maxlen)
with session.as_default():
with session.graph.as_default():
a_vecs = encoder.predict([a_token_ids,
np.zeros_like(a_token_ids)],
verbose=True)
return a_vecs
config1 = tf.ConfigProto(
device_count={'CPU': 1},
intra_op_parallelism_threads=1,
allow_soft_placement=True
)
session = tf.Session(config=config1)
keras.backend.set_session(session)
#BERT配置
config_path = r'D:\HomeWork\Paper\ZhangRong\BERT\BERT\GLUE\BERT_BASE_DIR\uncased_L-12_H-768_A-12\bert_config.json'
checkpoint_path =r'D:\HomeWork\Paper\ZhangRong\BERT\BERT\GLUE\BERT_BASE_DIR\uncased_L-12_H-768_A-12\bert_model.ckpt'
dict_path = r'D:\HomeWork\Paper\ZhangRong\BERT\BERT\GLUE\BERT_BASE_DIR\uncased_L-12_H-768_A-12\vocab.txt'
# 建立分詞器
tokenizer = get_tokenizer(dict_path)
# 建立模型
encoder = get_encoder(config_path, checkpoint_path)
# 加載NLI預(yù)訓(xùn)練權(quán)重
encoder.load_weights('D:\downloads\BERT-whitening-main\BERT-whitening-main\eng\weights\_res200.weights')
def vec():
data=read_file(config.classificaion_input_text_path)
print("在vec函數(shù)內(nèi)的",data)
# pattern = r'\.|\?|\~|!|。|、|;|‘|'|【|】|·|!|…|(|)'
# result_list = re.split(pattern, data)
# D1=[]
# for text in result_list:
# D1.append(text)
# nli_data = D1
nli_data = data
#在這里增加對(duì)不符合正常邏輯的句子的判斷?還是去除停用詞比較好呢?
nli_a_vecs= convert_to_vecs1(
nli_data, tokenizer, encoder
)
# nli_a_vecs=nli_a_vecs.reshape((2,384))
#得到白化后的向量
kernel, bias = compute_kernel_bias([nli_a_vecs],n_components=200)
# np.save('weights/hotel.kernel.bias' , [kernel, bias])
kernel = kernel[:, :768]
a_vecs = transform_and_normalize(nli_a_vecs, kernel, bias) #shape=[8000,768]
#需要在這里將[句子數(shù)量,768]變成[1,768]
a=[0]*200#200是這個(gè)最后的向量維度
for i in a_vecs:
a=a+i
output = a/len(a_vecs)
return output
4.批量文本的處理
這部分代碼和上面單條文本的很像,不同之處就是在predict()函數(shù)那里增加了讀取文件的操作,將對(duì)單文本進(jìn)行文本向量化變成了對(duì)多文本進(jìn)行文本向量化。
#! -*- coding: utf-8 -*-
# 簡(jiǎn)單的線性變換(白化)操作,就可以達(dá)到甚至超過(guò)BERT-flow的效果。
from utils import *
import os, sys
import numpy as np
import xlsxwriter
import re
from src import config
import pandas as pd
import tensorflow as tf
from tensorflow import keras
def save_to_file(filepath, content):
"""
Write the text to the local file.
Parameters
----------
filepath : TYPE-str
DESCRIPTION: the file save path.
Returns
-------
content : TYPE-str
DESCRIPTION: the text.
"""
f = open(filepath, 'w', encoding='utf-8')
f.write(content)
f.close()
def read_file(filepath):
"""
Read the local file and transform to text.
Parameters
----------
filepath : TYPE-str
DESCRIPTION: the text file path.
Returns
-------
content : TYPE-str
DESCRIPTION:The preprocessed news text.
"""
f = open(filepath,'r',encoding='utf-8')
content = f.read()
f.close()
return content
def load_mnli_train_data2(filename):
# df = pd.read_csv(filename, encoding='gbk')
# 劃分data與label
# data = df['comment_text']
D = []
with open(filename, encoding='gbk') as f:
for i, l in enumerate(f):
if i > 0:
D.append(l)
return D
def load_mnli_train_data3(filename):
df = pd.read_csv(filename, encoding='gbk')
data = df['comment_text']
D = []
for d in data:
D.append(d)
return D
def convert_to_ids1(data, tokenizer, maxlen=64):
"""轉(zhuǎn)換文本數(shù)據(jù)為id形式
"""
a_token_ids= []
for d in tqdm(data):
token_ids = tokenizer.encode(d, maxlen=maxlen)[0]
a_token_ids.append(token_ids)
a_token_ids = sequence_padding(a_token_ids)
return a_token_ids
def convert_to_vecs1(data, tokenizer, encoder, maxlen=64):
"""轉(zhuǎn)換文本數(shù)據(jù)為向量形式
"""
a_token_ids = convert_to_ids1(data, tokenizer, maxlen)
with session.as_default():
with session.graph.as_default():
a_vecs = encoder.predict([a_token_ids,
np.zeros_like(a_token_ids)],
verbose=True)
return a_vecs
config1 = tf.ConfigProto(
device_count={'CPU': 1},
intra_op_parallelism_threads=1,
allow_soft_placement=True
)
session = tf.Session(config=config1)
keras.backend.set_session(session)
#BERT配置
config_path = r'D:\HomeWork\Paper\ZhangRong\BERT\BERT\GLUE\BERT_BASE_DIR\uncased_L-12_H-768_A-12\bert_config.json'
checkpoint_path =r'D:\HomeWork\Paper\ZhangRong\BERT\BERT\GLUE\BERT_BASE_DIR\uncased_L-12_H-768_A-12\bert_model.ckpt'
dict_path = r'D:\HomeWork\Paper\ZhangRong\BERT\BERT\GLUE\BERT_BASE_DIR\uncased_L-12_H-768_A-12\vocab.txt'
# 建立分詞器
tokenizer = get_tokenizer(dict_path)
# 建立模型
encoder = get_encoder(config_path, checkpoint_path)
# 加載NLI預(yù)訓(xùn)練權(quán)重
encoder.load_weights('D:\downloads\BERT-whitening-main\BERT-whitening-main\eng\weights\_res200.weights')
# 得到向量
def vec1(nli_data):
# 在這里增加對(duì)不符合正常邏輯的句子的判斷?還是去除停用詞比較好呢?
# nli_data = preProcess(nli_data) #先將網(wǎng)頁(yè)那些去除
nli_a_vecs = convert_to_vecs1(
nli_data, tokenizer, encoder
)
# 得到白化后的向量
kernel, bias = compute_kernel_bias([nli_a_vecs], n_components=200)
# np.save('weights/hotel.kernel.bias' , [kernel, bias])
kernel = kernel[:, :768]
a_vecs = transform_and_normalize(nli_a_vecs, kernel, bias) # shape=[8000,768]
# 需要在這里將[句子數(shù)量,768]變成[1,768]
a = [0] * 200 # 200是這個(gè)最后的向量維度
for i in a_vecs:
a = a + i
output = a / len(a_vecs)
return output
## 導(dǎo)入邏輯回歸模型函數(shù)
from sklearn.linear_model import LogisticRegression
import pandas as pd
from sklearn import linear_model
from src.exe import Singlesentence
from Singlesentence import *
import tensorflow as tf
from tensorflow import keras
##Demo演示LogisticRegression分類(lèi)
## 構(gòu)造數(shù)據(jù)集
train_data_features=pd.read_csv(r'D:\BaiduNetdiskDownload\yelp\new\BHAN+W\res.csv') #需要加一行數(shù)組標(biāo)
file_name = r'D:\BaiduNetdiskDownload\yelp\yelp_rzj\label.csv' #鍵入訓(xùn)練數(shù)據(jù)名
label_name = 'label1' #鍵入標(biāo)簽列標(biāo)題
#提取評(píng)論標(biāo)簽
def getLabel():
df_data=pd.read_csv(file_name, encoding='utf-8')
data = list(df_data[label_name])
return data
label = getLabel()
x_fearures = train_data_features
y_label = label
## 調(diào)用邏輯回歸模型
lr_clf = LogisticRegression()
## 用邏輯回歸模型擬合構(gòu)造的數(shù)據(jù)集
lr_clf = lr_clf.fit(x_fearures, y_label)
def predict(filepath):
Data = []
#開(kāi)始預(yù)測(cè)
data = load_mnli_train_data3(filepath)
for input_text in data:
#進(jìn)行預(yù)處理,去掉<br>和索引號(hào)
input_text = re.sub(r"'", "", input_text)
input_text = re.sub(r"[^a-zA-Z0-9\s]", "", input_text)
predict=lr_clf.predict([vec1(input_text)])
if predict[0] == 1:
a = '真實(shí)'
Data.append([input_text,a])
else:
b = '虛假'
Data.append([input_text,b])
curinput={'Data':Data,'filename':filepath,'url':read_file(config.keyinfo_input_url_path) }
print(Data)
return curinput
# predict()
# def save():
# # 將data內(nèi)容寫(xiě)到表格中
# dd=pd.DataFrame(predict().Data,columns=['comment','label'])
# file='D:\downloads\predict_file.csv'
# dd.to_csv(file)
# return file
#5.爬取網(wǎng)頁(yè)代碼
import requests
import csv
# 設(shè)置 API 訪問(wèn)密鑰和 API 端點(diǎn) URL
# API_KEY = 'GET https://api.yelp.com/v3/businesses/north-india-restaurant-san-francisco/reviews'
# API_HOST = 'https://api.yelp.com/v3'
# REVIEWS_PATH = '/businesses/{}/reviews'
#
# # 設(shè)置餐廳ID和請(qǐng)求頭
# business_id = 'NORTH-INDIA-RESTAURANT-SAN-FRANCISCO'
# headers = {'Authorization': 'Bearer %s' % API_KEY}
#
# # 發(fā)送評(píng)論請(qǐng)求獲取餐廳評(píng)論
# url = API_HOST + REVIEWS_PATH.format(business_id)
#通過(guò)請(qǐng)求分析得到店鋪的評(píng)論接口,然后進(jìn)行爬取解析Json對(duì)象得到想要的內(nèi)容和特征
def claw(res_name):
# businessid=res_name
i=0
print(res_name+"這是店鋪名稱(chēng)")
response = requests.get('https://www.yelp.com/biz/{}/review_feed?start={}'.format(res_name,i))
reviews = response.json()['reviews']
# 將評(píng)論數(shù)據(jù)寫(xiě)入 CSV 文件
with open('yelp_reviews.csv', mode='w', encoding='utf-8', newline='') as file:
writer = csv.writer(file)
writer.writerow(['User Name', 'User_URL', 'Review Data', 'Rating', 'comment_text', 'Review Count'])
for review in reviews:
user_name = review['user']['altText'] # 用戶(hù)ID
user_link = review['user']['link'][21:] # 用戶(hù)個(gè)人地址
review_count = review['user']['reviewCount'] # 用戶(hù)評(píng)論數(shù)量
rating = review['rating'] # 評(píng)論評(píng)分
text = review['comment']['text'] # 評(píng)論
data = review['localizedDate'] # 拿的評(píng)論日期
writer.writerow([user_name, user_link, data, rating, text, review_count])主要代碼好像就這么多了。接下來(lái)是可視化界面:









到此這篇關(guān)于基于Python實(shí)現(xiàn)虛假評(píng)論檢測(cè)可視化系統(tǒng)的文章就介紹到這了,更多相關(guān)Python虛假評(píng)論檢測(cè)系統(tǒng)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
pytorch加載的cifar10數(shù)據(jù)集過(guò)程詳解
這篇文章主要介紹了pytorch加載的cifar10數(shù)據(jù)集,到底有沒(méi)有經(jīng)過(guò)歸一化,本文對(duì)這一問(wèn)題給大家介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值,需要的朋友參考下吧2023-11-11
Numpy 多維數(shù)據(jù)數(shù)組的實(shí)現(xiàn)
這篇文章主要介紹了Numpy 多維數(shù)據(jù)數(shù)組的實(shí)現(xiàn),文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧2020-06-06
Pandas中DataFrame對(duì)象轉(zhuǎn)置(交換行列)
本文主要介紹了Pandas中DataFrame對(duì)象轉(zhuǎn)置(交換行列),文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧2023-02-02
python登錄WeChat 實(shí)現(xiàn)自動(dòng)回復(fù)實(shí)例詳解
在本篇內(nèi)容里小編給大家整理的是關(guān)于python登錄WeChat 實(shí)現(xiàn)自動(dòng)回復(fù)的相關(guān)實(shí)例內(nèi)容以及知識(shí)點(diǎn)總結(jié),有興趣的朋友們參考下。2019-05-05
Python針對(duì)給定字符串求解所有子序列是否為回文序列的方法
這篇文章主要介紹了Python針對(duì)給定字符串求解所有子序列是否為回文序列的方法,涉及Python針對(duì)字符串的遍歷、判斷、運(yùn)算相關(guān)操作技巧,需要的朋友可以參考下2018-04-04
實(shí)例分析python3實(shí)現(xiàn)并發(fā)訪問(wèn)水平切分表
在本文中小編給大家整理了關(guān)于python3實(shí)現(xiàn)并發(fā)訪問(wèn)水平切分表的相關(guān)知識(shí)點(diǎn)以及實(shí)例代碼,有興趣的朋友們參考下。2018-09-09
Django 開(kāi)發(fā)調(diào)試工具 Django-debug-toolbar使用詳解
這篇文章主要介紹了Django 開(kāi)發(fā)調(diào)試工具 Django-debug-toolbar使用詳解,文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2019-07-07
python 自動(dòng)化辦公之批量修改文件名實(shí)操
這篇文章主要介紹了python 自動(dòng)化辦公之批量修改文件名實(shí)操,文章圍繞主題展開(kāi)詳細(xì)的內(nèi)容介紹,具有一定的參考價(jià)值,需要的小伙伴可以參考一下2022-07-07

