<span id="fpn9h"><noframes id="fpn9h"><span id="fpn9h"></span>
<span id="fpn9h"><noframes id="fpn9h">
<th id="fpn9h"></th>
<strike id="fpn9h"><noframes id="fpn9h"><strike id="fpn9h"></strike>
<th id="fpn9h"><noframes id="fpn9h">
<span id="fpn9h"><video id="fpn9h"></video></span>
<ruby id="fpn9h"></ruby>
<strike id="fpn9h"><noframes id="fpn9h"><span id="fpn9h"></span>
  • 《工程索引》(EI)刊源期刊
  • 中文核心期刊
  • 中國科技論文統計源期刊
  • 中國科學引文數據庫來源期刊

留言板

尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

姓名
郵箱
手機號碼
標題
留言內容
驗證碼

一種面向網絡長文本的話題檢測方法

鄭恒毅 廖城霖 李天柱

鄭恒毅, 廖城霖, 李天柱. 一種面向網絡長文本的話題檢測方法[J]. 工程科學學報, 2019, 41(9): 1208-1214. doi: 10.13374/j.issn2095-9389.2019.09.013
引用本文: 鄭恒毅, 廖城霖, 李天柱. 一種面向網絡長文本的話題檢測方法[J]. 工程科學學報, 2019, 41(9): 1208-1214. doi: 10.13374/j.issn2095-9389.2019.09.013
ZHENG Heng-yi, LIAO Cheng-lin, LI Tian-zhu. A topic detection method for network long text[J]. Chinese Journal of Engineering, 2019, 41(9): 1208-1214. doi: 10.13374/j.issn2095-9389.2019.09.013
Citation: ZHENG Heng-yi, LIAO Cheng-lin, LI Tian-zhu. A topic detection method for network long text[J]. Chinese Journal of Engineering, 2019, 41(9): 1208-1214. doi: 10.13374/j.issn2095-9389.2019.09.013

一種面向網絡長文本的話題檢測方法

doi: 10.13374/j.issn2095-9389.2019.09.013
詳細信息
    通訊作者:

    廖城霖, E-mail: liaochenglin1127@gmail.com

  • 中圖分類號: TP391.4

A topic detection method for network long text

More Information
  • 摘要: 提出了一種面向網絡長文本的話題檢測方法.針對文本表示的高維稀疏性和忽略潛在語義的問題,提出了Word2vec&LDA(latent dirichlet allocation)的文本表示方法.將LDA提取的文本特征詞隱含主題和Word2vec映射的特征詞向量進行加權融合既能夠進行降維的作用又可以較為完整的表示出文本信息.針對傳統話題發現方法對長文本輸入順序敏感問題,提出了基于文本聚類的Single-Pass&HAC(hierarchical agglomerative clustering)的話題發現方法,在引入時間窗口和凝聚式層次聚類的基礎上對于文本的輸入順序具有了更強的魯棒性,同時提高了聚類的精度和效率.為了評估所提出方法的有效性,本文從某大學社交平臺收集了來自真實世界的多源數據集,并基于此進行了大量的實驗.實驗結果證明,本文提出的方法相對于現有的方法,如VSM(state vector space model)、Single-Pass等擁有更好的效果,話題檢測的精度提高了10%~20%.

     

  • 圖  1  Single-Pass&HAC算法流程

    Figure  1.  Single-Pass&HAC algorithm flow

    圖  2  LDA建模過程圖

    Figure  2.  LDA modeling process diagram

    圖  3  Skip-Gram模型結構

    Figure  3.  Skip gram model structure

    圖  4  基于時間窗口的Single-Pass聚類流程

    Figure  4.  Single-Pass clustering process based on time window

    圖  5  HAC話題合并流程

    Figure  5.  HAC topic merge process

    圖  6  不同維度下F值的變化

    Figure  6.  Change in F value for different dimensions

    圖  7  閾值T的影響

    Figure  7.  Change in F value for different thresholds

    表  1  各種聚類算法的性能

    Table  1.   Performance of various clustering algorithms

    聚類算法 準確率 召回率 F
    VSM+K-Means 0.705 0.703 0.704
    LDA 0.728 0.742 0.735
    LDA+Single-Pass&HAC 0.778 0.799 0.789
    Word2vec+Single-Pass&HAC 0.794 0.801 0.797
    LDA&Word2vec+Single-Pass&HAC 0.833 0.845 0.839
    下載: 導出CSV
    <span id="fpn9h"><noframes id="fpn9h"><span id="fpn9h"></span>
    <span id="fpn9h"><noframes id="fpn9h">
    <th id="fpn9h"></th>
    <strike id="fpn9h"><noframes id="fpn9h"><strike id="fpn9h"></strike>
    <th id="fpn9h"><noframes id="fpn9h">
    <span id="fpn9h"><video id="fpn9h"></video></span>
    <ruby id="fpn9h"></ruby>
    <strike id="fpn9h"><noframes id="fpn9h"><span id="fpn9h"></span>
    www.77susu.com
  • [1] AlSumait L S. Online Topic Detection, Tracking, and Significance Ranking Using Generative Topic Models [Dissertation]. Fairfax: George Mason University, 2009
    [2] Allan J, Harding S, Fisher D, et al. Taking topic detection from evaluation to practice//Proceedings of the 38th Annual Hawaii International Conference on System Sciences. Big Island, 2005: 1
    [3] Allan J, Lavrenko V, Swan R. Explorations within topic tracking and detection//Topic Detection and Tracking. Boston: Springer, 2002: 197
    [4] Schultz J M, Liberman M Y. Towards a "Universal Dictionary" for multi-language information retrieval applications//Topic Detection and Tracking. Boston: Springer, 2002: 225
    [5] Jiang P. Design and Implementation of Public Opinion Analysis System of Shandong University [Dissertation]. Jinan: Shandong University, 2015

    姜朋. 山東大學輿情分析系統的設計與實現[學位論文]. 濟南: 山東大學, 2015
    [6] Huang M X. The design and the implementation of the public opinion analysis system based on subject discovery. J Beijing Union Univ Nat Sci, 2012, 26(1): 33 doi: 10.3969/j.issn.1005-0310.2012.01.009

    黃美璇. 基于主題發現的輿情分析系統的設計與實現. 北京聯合大學學報: 自然科學版, 2012, 26(1): 33 doi: 10.3969/j.issn.1005-0310.2012.01.009
    [7] Ren H G. The Design and Implementation of Public Opinion Analysis System Based on Topic Events [Dissertation]. Beijing: Beijing University of Posts and Telecommunications, 2012

    任海果. 基于主題事件的輿情分析系統的設計與實現[學位論文]. 北京: 北京郵電大學, 2012
    [8] Wu L H. Forum Based Topic Detection and Tracking Algorithms Study on [Dissertation]. Beijing: Beijing University of Posts and Telecommunications, 2013

    吳利華. 基于論壇的話題發現與跟蹤算法研究[學位論文]. 北京: 北京郵電大學, 2013
    [9] Gao X. Designing and Building APublic Opinion Monitoring System Based on Forum Information [Dissertation]. Harbin: Harbin Institute of Technology, 2012

    高雄. 基于論壇的輿情分析系統設計與實現[學位論文]. 哈爾濱: 哈爾濱工業大學, 2012
    [10] Zhou Y T, Tang J B, Wu Z G. Method of multi-topic Web text classification based on VSM. Appl Res Comput, 2008, 25(1): 142 doi: 10.3969/j.issn.1001-3695.2008.01.043

    周炎濤, 唐劍波, 吳正國. 基于向量空間模型的多主題Web文本分類方法. 計算機應用研究, 2008, 25(1): 142 doi: 10.3969/j.issn.1001-3695.2008.01.043
    [11] Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993
    [12] Huang B, Yang Y, Mahmood A, et al. Microblog topic detection based on LDA model and single-pass clustering//International Conference on Rough Sets and Current Trends in Computing. Berlin: Springer, 2012
    [13] Hu X L. Micro-blog topic drift detection based on VSM and LDA models. J Lanzhou Univ Technol, 2015, 41(5): 104 doi: 10.3969/j.issn.1673-5196.2015.05.023

    胡秀麗. 基于VSM和LDA模型相結合的微博話題漂移檢測. 蘭州理工大學學報, 2015, 41(5): 104 doi: 10.3969/j.issn.1673-5196.2015.05.023
    [14] Wang Z Z, He M, Du Y P. Text similarity computing based on topic model LDA. Comput Sci, 2013, 40(12): 229 doi: 10.3969/j.issn.1002-137X.2013.12.049

    王振振, 何明, 杜永萍. 基于LDA主題模型的文本相似度計算. 計算機科學, 2013, 40(12): 229 doi: 10.3969/j.issn.1002-137X.2013.12.049
    [15] Hinton G E. Learning distributed representations of concepts//Proceedings of the Eighth Annual Conference of the Cognitive Science Society. Amherst, 1986: 1
    [16] Tang M, Zhu L, Zou X C. Document vector representation based on Word2Vec. Comput Sci, 2016, 43(6): 214 https://www.cnki.com.cn/Article/CJFDTOTAL-JSJA201606045.htm

    唐明, 朱磊, 鄒顯春. 基于Word2Vec的一種文檔向量表示. 計算機科學, 2016, 43(6): 214 https://www.cnki.com.cn/Article/CJFDTOTAL-JSJA201606045.htm
    [17] Zhang D, Li S D. Topic detection based on K-means//International Conference on Electronics, Communications and Control (ICECC). Ningbo, 2011: 2983
    [18] Meng Z Q, Shen S M, Chen Q L. A network decomposition-based text clustering algorithm for topic detection. Appl Mech Mater, 2013, 239-240: 1318 http://www.scientific.net/AMM.239-240.1318
    [19] Yi X L, Zhao X, Ke N, et al. An improved Single-Pass clustering algorithm internet-oriented network topic detection//Fourth International Conference on Intelligent Control and Information Processing (ICICIP). Beijing, 2013: 560
    [20] Huang S, Peng X P, Niu Z D, et al. News topic detection based on hierarchical clustering and named entity//7th International Conference on Natural Language Processing And Knowledge Engineering. Tokushima, 2011: 280
    [21] Lei Z, Wu L D, Lei L, et al. Incremental K-means method based on initialisation of cluster centers and its application in news event detection. J Chin Soc Sci Tech Inf, 2006, 25(3): 289

    雷震, 吳玲達, 雷蕾, 等. 初始化類中心的增量K均值法及其在新聞事件探測中的應用. 情報學報, 2006, 25(3): 289
  • 加載中
圖(7) / 表(1)
計量
  • 文章訪問數:  950
  • HTML全文瀏覽量:  322
  • PDF下載量:  24
  • 被引次數: 0
出版歷程
  • 收稿日期:  2019-01-03
  • 刊出日期:  2019-09-01

目錄

    /

    返回文章
    返回