
it2022-05-05  128


\quad 因为之前存在聚类簇中信息不够干净,有噪声的情况,为解决这个问题,尝试采取将推文中实体词,“LOC”,“PER”,"ORG"和其他单词分别提取出来单独与已有的簇进行相似度的比较。具体方案为检测每条推文各类实体词有无的情况来赋予权值,如下:

def tweet_cluster_similarity_with_four_vector(self, tweet_data, cluster): count = [0, 0, 0, 0] if (tweet_data["LOC"]): count[0] = 1 if (tweet_data["PER"]): count[1] = 1 if (tweet_data["ORG"]): count[2] = 1 if (tweet_data["other"]): count[3] = 1 sim_coefficient = {'PER': 0, 'LOC': 0, 'ORG': 0, 'other': 0} if(count[3]==0): if(sum(count[:3])==0): return 0.0 sim_coefficient['other'] = 0 wight = 1.0/sum(count[:3]) if (count[0] == 1): sim_coefficient['LOC'] = wight if (count[1] == 1): sim_coefficient['PER'] = wight if (count[2] == 1): sim_coefficient['ORG'] = wight else: if(sum(count[:3])==0): sim_coefficient['other'] = 1 else: sim_coefficient['other'] = 0.7 wight = 0.3 / sum(count[:3]) if (count[0] == 1): sim_coefficient['LOC'] = wight if (count[1] == 1): sim_coefficient['PER'] = wight if (count[2] == 1): sim_coefficient['ORG'] = wight per_sim_value = self.cosine_only_similarity(cluster.get_entity_word_list('PER'), tweet_data['PER']) org_sim_value = self.cosine_only_similarity(cluster.get_entity_word_list('ORG'), tweet_data['ORG']) loc_sim_value = self.cosine_only_similarity(cluster.get_entity_word_list('LOC'), tweet_data['LOC']) other_sim_value = self.cosine_only_similarity(cluster.get_entity_word_list('other'), tweet_data['other']) res = sim_coefficient['PER'] * per_sim_value + sim_coefficient['ORG'] * org_sim_value + \ sim_coefficient['LOC'] * loc_sim_value + sim_coefficient['other'] * other_sim_value return res


\quad 观察了前33个簇,未改进时有12个簇内包含噪声,改进后有7个簇包含噪声。

old:0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 total:33,1:21, 0:12 new:1 1 1 0 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 total:33,1:26, 0:7

\quad 举个例子,原始版本的一个簇内包含多种类型的信息,如下:

cluster_id:55 原始版本 $tweet_num_without_dup: 47 $all_tweet_num: 55 cluster_entities_dict: {u'LOC': [[u'pakistan', 1], [u'usa', 1], [u'west', 1], [u'qinghai', 1], [u'hk', 2], [u'china', 82], [u'uk', 2], [u'japan', 2], [u'india', 1], [u'spain', 1]], u'PER': [[u'mercedes', 1], [u'hole', 1]], u'ORG': [[u'commerce', 1], [u'fineproxy', 1], [u'university', 1], [u'philippine', 1], [u'huawei', 1], [u'magazine', 1], [u'tsinghua', 1]]} 2019-07-04 16:03:19 - #Live: China's commerce ministry holds regular press conference. #trade https://t.co/SYiCm3J5Iy 2019-07-04 16:03:24 - China commerce ministry says that trade teams from both US and China are in touch https://t.co/7aKorTmNvM 2019-07-04 16:04:21 - "All the tariffs should be lifted if China and the U.S. want to reach an agreement. China's attitude is specific and consistent," said Gao Feng, spokesperson for China's Ministry of Commerce. https://t.co/aprA5rId8q 2019-07-04 16:06:36 - #OldYellowMen in China are afraid of losing control of China.Those bastards. https://t.co/XVdK74Axj2 2019-07-04 16:06:46 - @LKEBRJTI @Jeremy_Hunt US and UK are purposely abusing the hk system just to make China to make a move but China doesn't so China is not abusing the system especially if the west can mind their own business then everything is back to normal. 2019-07-04 16:09:11 - China commerce ministry says that trade teams from both US and China are in touch https://t.co/6eFreQnrZx #forex #news #forextrading #investing 2019-07-04 16:09:38 - Censorship is closing China's young minds... #China https://t.co/wPAi7n8adM 2019-07-04 16:10:59 - China, U.S. teams are in close contact for trade talks #ChinaUS https://t.co/ik67A2ZBkM https://t.co/m2Ju2sKe1W 2019-07-04 16:11:58 - @Wrightgal80 Think that is being saved for China 2019-07-04 16:13:44 - @17cShyteposter Neoliberalism is good. And it has no need whatsoever to import people.Free trade with China is good; it benefits the people of both countries. Likewise with Mexico.But mass immigration is not necessary to neoliberalism. That's an evil imposed by our elites for no good reason. 2019-07-04 16:13:45 - @realDonaldTrump start trade war with china →American stock market getting down →make peace with china during G20 →stock market getting crazy →absolutely doing nothing 2019-07-04 16:13:52 - Interesting and informative article from @AbacusNews about one of the three top players in China for streaming and therefore one of our main partners for delivering live coverage of clients events. #fotwlive #liveevents #eventbroadcast 2019-07-04 16:14:59 - Companies prepare to move production out of China https://t.co/HHZgz7nUqy by @ElectronicsNews #Engineering #OEM #China #Production 2019-07-04 16:15:40 - Wow! 260 Taiwanese sent to China — many were “disappeared”. It is outrageous that both China and Spain are complicit in this. #shame https://t.co/C3vdq01Ava 2019-07-04 16:15:42 - @zlj517 West dont remember their history, China fought for its freedom with blood & head now China is Global Economic & Defense Leader this is very hard for west to digest because time is near that China will be SUPER POWER. Long Live China. https://t.co/UN7sdermFy 2019-07-04 16:16:28 - Great Hole of China.. it becomes even greater in part 2 of Daydream✔️?https://t.co/zNYr39X2GI https://t.co/kKlPtjkcff 2019-07-04 16:17:07 - The liquidity, depth and breadth of China's listed companies is now second only to the U.S. Read more. https://t.co/dTz4Zp2om5 2019-07-04 16:17:49 - That's a very good read, indeed it gives a very useful framework to understand China's quick global powergrab. 2019-07-04 16:18:23 - Woww!! China has clearly stepped into a whole new level! 2019-07-04 16:18:34 - My partner in China has an amazing Mercedes MAYBACH 2019 ? >Billionaires in China ?? are really amazing !!!! https://t.co/HoJgBr38O9 2019-07-04 16:18:34 - @SkyNews There is a recession on the way anyway unless this trade war between the US and China is sorted out...just stating what could be obvious 2019-07-04 16:20:37 - US doesn’t really need China’s rare minerals https://t.co/2rEMqK8C7D via @YouTube 2019-07-04 16:21:25 - Food To China Magazine has been released now! This issue cover an in-depth analysis of the snack industry in China, global food trend and other information related to China food market.Click to read →… https://t.co/KWa22PrpED 2019-07-04 16:22:05 - From @NewzooHQ - "China to lose top spot to US in 2019 gaming market"https://t.co/HBLC46yIbl 2019-07-04 16:22:45 - @cnni China is a highly free and safe place, trust me, HK is one part of China. 2019-07-04 16:24:04 - Lessons from Qinghai.? Learn more about China's solar capacity: https://t.co/DBUqHHxjpa #amnc19 https://t.co/YKqR18Btn2 2019-07-04 16:24:17 - You DON'T need to be a part of China to trade with China, you DON'T need to be a part of Japan to trade with Japan.... YOU DON'T NEED TO BE A PART OF THE EU TO TRADE WITH THE EU..... AND WE WON'T. IREXIT is coming! 2019-07-04 16:24:21 - China’s Vanishing Muslims: Undercover In The Most Dystopian Place In The World https://t.co/q3uXyj2ABF 2019-07-04 16:24:45 - U.S and China trade talks resumes Huawei and 5G remains blacklisted in the U.S https://t.co/MW9bwNBkLj https://t.co/gmxpLLzUEo 2019-07-04 16:25:55 - Discover the word with a China proxy https://t.co/5fKWZQerIJ #china #proxy #server #fineproxy #ip 2019-07-04 16:26:28 - https://t.co/UM3K98X6ys China green supersonic civil airplane by 2035 2019-07-04 16:26:29 - From @JhessetEnanoINQ: A less blatant but equally potent way that China is infringing on Philippine territory: https://t.co/IcUmrnr4qr 2019-07-04 16:26:31 - China Everbright Water clinches 3 water-related projects in China https://t.co/LLaWuG4pTm 2019-07-04 16:28:50 - China won the 2016 Sino-US All-Star Waterskiing Competition on Oct.2 in Liuzhou, southern China https://t.co/L0iZqij3pZ 2019-07-04 16:29:03 - Making China a U.S. enemy is counterproductive https://t.co/HOuxLgbV8Y 2019-07-04 16:30:21 - Excellent thread from one of the sharpest China watchers https://t.co/AXA2zNsZJA 2019-07-04 16:30:25 - n 2019, the USA will remain an important global economy, but the trade war tension with China is likely to have a negative effect on its growth figures. #TheKFTrade #KaravanFreeTrade #USA #10ThingsToKnow #Americas #Economy #Partners #TradeKnow more: https://t.co/MjvpN4crP8 2019-07-04 16:30:54 - China's domestic and international tourism industry continues to remain robust and stable:https://t.co/ld2WZQPi0E 2019-07-04 16:31:06 - @simongerman600 And...China is shaped like a chicken ? https://t.co/cYxsEGwd9Y 2019-07-04 16:31:35 - @Echinanews China has 1 Million Muslims in internment slave factories.China literally is the enemy of multi-party democracyChina destroys thousands of churches, China is against religious libertyChina IS the enemy! 2019-07-04 16:31:49 - "For the international community, this is China so nobody really cares". China's most famous dissident, the artist @aiww , says Hong Kong is right to fear greater control from Beijing and says the international community should care more about what is happening in the city https://t.co/SZO236oU2J 2019-07-04 16:32:28 - Commerce Ministry's Gao: China, US Trade Teams Are In Communications#forex 2019-07-04 16:32:29 - The latest The Dutchies in #China Daily! https://t.co/cfbmooPq3O 2019-07-04 16:34:09 - @NewscastGlobal @BanShankari @tfoale @Anisvohra27 @tehseenp India shd take benefit of trade war of US and China leading to increasing sanctions on China by https://t.co/wGjxJLkV20 reach 5 trillion economy ,a dream by PM modi,it is a golden opportunity. 2019-07-04 16:35:08 - @derekjames150 HK is now China’s territory and all the UK can do is try hold China to the terms of the Handover Agreement. However, given China’s near-superpower status and the UK’s diminished international standing, there is practically nothing the UK can do against China for any breaches. 2019-07-04 16:35:10 - 1. Led session on Getting Taxation Right at @wef. Contrasted Pakistan's situation with China. Tax to GDP in China is north of 20%. With other fees, north of 25%. Govt. in Pakistan needs to build momentum to increase its revenue base, to spend more on citizens :: Taimur @Jhagra https://t.co/nCKuWU83gn 2019-07-04 16:36:23 - China’s famous Tsinghua University has invited me to address in September a gathering of scholars to speak on “China’s Economic Development: A Review Of Last 70 years.” Since Namo is not interested in knowing my views I might as well go to China

\quad 大致包含中美关税、贸易谈判、将工厂转移出去,和一些广告或者个人感受等杂七杂八的东西,未将这些分开。现改进后效果如下:

新版本 # 中美关税 2019-07-04 16:05:22 - China says existing U.S. tariffs must be removed for a trade deal: https://t.co/aLQY0LU8KQ 2019-07-04 16:06:08 - If China and the US could eventually reach a trade deal, the newly imposed additional tariffs must be eliminated, once and for all, according to China's Ministry of Commerce on Thursday. #trade https://t.co/Dzidx5f7s1 2019-07-04 16:09:13 - China says existing US tariffs must be removed for a trade deal – NEWPAPER24 https://t.co/z0NlP1cwZ4 2019-07-04 16:09:49 - If China and the United States could eventually reach a trade deal, the newly imposed additional tariffs must be eliminated, once and for all, according to China's Ministry of Commerce on Thursday. #ChinaUStrade https://t.co/N64d79RTwL 2019-07-04 16:13:04 - financial careers at https://t.co/VCmmZrK7YO | China says existing U.S. tariffs must be removed for a trade deal | CBC News | careers at https://t.co/VCmmZrK7YO https://t.co/BKKdNFBp5f 2019-07-04 16:14:48 - New Article: China says existing U.S. tariffs must be removed for a trade deal https://t.co/sYk1Co2O4o #businessNews July 4, 2019 2019-07-04 16:15:00 - Existing #US #tariffs will have to be removed if there is to be a #trade deal between Beijing and Washington, #China's commerce ministry said on Thursday.https://t.co/jrpZP2rJRX 2019-07-04 16:19:46 - #China MOFCOM: Existing #tariffs will have to be removed if #US and China are to reach a deal.: US have big room for agricultural cooperation.Hope can reach an agreement on this on basis of equality. 2019-07-04 16:26:02 - Existing US tariffs will have to be removed if there is to be a trade deal between Beijing and Washington, China’s commerce ministry said on Thursday.https://t.co/NHwBu7nhek 2019-07-04 16:33:36 - Reuters: China says existing U.S. tariffs must be removed for a trade deal https://t.co/dWJ04JdxQs https://t.co/gyrmZKBbeh # 中美贸易团队频繁联络 2019-07-04 16:03:19 - #Live: China's commerce ministry holds regular press conference. #trade https://t.co/SYiCm3J5Iy 2019-07-04 16:03:24 - China commerce ministry says that trade teams from both US and China are in touch https://t.co/7aKorTmNvM 2019-07-04 16:09:11 - China commerce ministry says that trade teams from both US and China are in touch https://t.co/6eFreQnrZx #forex #news #forextrading #investing 2019-07-04 16:10:59 - China, U.S. teams are in close contact for trade talks #ChinaUS https://t.co/ik67A2ZBkM https://t.co/m2Ju2sKe1W 2019-07-04 16:32:28 - Commerce Ministry's Gao: China, US Trade Teams Are In Communications#forex # 转移工厂 2019-07-04 16:06:14 - Row with US: Global technology companies planning to shift production out of China @zlj517 @CathayPak @Huawei #Pakistan #JazzSuper4G https://t.co/PlrJBLuD9x 2019-07-04 16:31:05 - Major U.S. companies planning to shift substantial production out of China https://t.co/lDuaC9QYaN #Business https://t.co/eQQT8822Zi #359



cluster_id:115 $tweet_num_without_dup: 7 $all_tweet_num: 8 cluster_entities_dict: {u'LOC': [[u'tornado', 3], [u'kaiyuan', 2], [u'china', 6], [u'city', 2]], u'PER': [[u'tornado', 2]], u'ORG': [[u'tornado', 5], [u'state', 1]]} 2019-07-04 16:04:37 - Tornado in northeast China kills 6 people, injures 190 https://t.co/1BIFDcf6uH 2019-07-04 16:05:07 - INTERNATIONAL: A tornado in northeast China has reportedly killed 6 people and injured at least 190. https://t.co/KXls4hHaQj https://t.co/Ee9gpKSd0O https://t.co/7VIV2zYvrK 2019-07-04 16:06:03 - Dramatic footage of a violent #tornado has been a least 6 people were killed and more than 190 others were injured to after a twister barrelled through the city in #Kaiyuan, north-east #China’s #Liaoning province today. https://t.co/AcF7ffMhae 2019-07-04 16:08:08 - A tornado in northeast China has reportedly killed six people and injured at least 190. https://t.co/IWsCJ57j49 https://t.co/F6HovboTwH 2019-07-04 16:13:08 - Violent #tornado have been least six people killed and 120 others injured as tornado hit China's Kaiyuan City leaves a trail of destruction today. https://t.co/SkY2KIIiv4 2019-07-04 16:31:51 - A tornado has left six people dead and nearly 200 injured after ripping through a northeastern Chinese city, local authorities said. https://t.co/SAUR2Qq4AJ 2019-07-04 16:35:38 - State media says a tornado in northeast China kills six people and injured another 190. https://t.co/46pEfJspGp cluster_id:174 $tweet_num_without_dup: 3 $all_tweet_num: 5 cluster_entities_dict: {u'LOC': [[u'tornado', 3], [u'china', 5]], u'PER': [], u'ORG': [[u'tornado', 2]]} 2019-07-04 16:05:40 - Tornado in NE China kills six, injures 120 https://t.co/8Vnc3WoZiH https://t.co/pBDZcqQzYz 2019-07-04 16:06:05 - Tornado in China kills 6, injures nearly 200 https://t.co/xfgbE2cs1g 2019-07-04 16:11:52 - jacarandafm: WATCH: Tornado kills 6, injures nearly 200 in China https://t.co/QiDgpfbU3K. https://t.co/vFFeSROQjy cluster_id:1012 $tweet_num_without_dup: 14 $all_tweet_num: 34 cluster_entities_dict: {u'LOC': [], u'PER': [], u'ORG': []} 2019-07-04 16:04:09 - "Tornado Hits Northeast China as More 'Extreme' Weather Strikes" by Reuters via NYT https://t.co/2uATKRb8fa https://t.co/tUSFW0D6JX 2019-07-04 16:05:55 - 6 dead, 190 injured after tornado hits northeast China https://t.co/rgHtA0r7th 2019-07-04 16:04:37 - Tornado in northeast China kills 6 people, injures 190 https://t.co/1BIFDcf6uH 2019-07-04 16:05:07 - INTERNATIONAL: A tornado in northeast China has reportedly killed 6 people and injured at least 190. https://t.co/KXls4hHaQj https://t.co/Ee9gpKSd0O https://t.co/7VIV2zYvrK 2019-07-04 16:06:03 - Dramatic footage of a violent #tornado has been a least 6 people were killed and more than 190 others were injured to after a twister barrelled through the city in #Kaiyuan, north-east #China’s #Liaoning province today. https://t.co/AcF7ffMhae 2019-07-04 16:08:08 - A tornado in northeast China has reportedly killed six people and injured at least 190. https://t.co/IWsCJ57j49 https://t.co/F6HovboTwH 2019-07-04 16:13:08 - Violent #tornado have been least six people killed and 120 others injured as tornado hit China's Kaiyuan City leaves a trail of destruction today. https://t.co/SkY2KIIiv4 2019-07-04 16:31:51 - A tornado has left six people dead and nearly 200 injured after ripping through a northeastern Chinese city, local authorities said. https://t.co/SAUR2Qq4AJ 2019-07-04 16:35:38 - State media says a tornado in northeast China kills six people and injured another 190. https://t.co/46pEfJspGp 2019-07-04 16:05:40 - Tornado in NE China kills six, injures 120 https://t.co/8Vnc3WoZiH https://t.co/pBDZcqQzYz 2019-07-04 16:06:05 - Tornado in China kills 6, injures nearly 200 https://t.co/xfgbE2cs1g 2019-07-04 16:11:52 - jacarandafm: WATCH: Tornado kills 6, injures nearly 200 in China https://t.co/QiDgpfbU3K. https://t.co/vFFeSROQjy 2019-07-04 16:06:51 - 6 people died after a strong tornado hit a city in Northeast China's Liaoning province. https://t.co/iXRN67gq0t 2019-07-04 16:12:35 - A deadly tornado tears through Kaiyuan in Liaoning province in China Wednesday afternoon, killing at least 6 people. Video like this is absolutely terrifying, but also a reminder of the importance of putting as many walls between you & the storm possiblehttps://t.co/ucQPS4494Z


cluster_id:100 $tweet_num_without_dup: 3 $all_tweet_num: 6 cluster_entities_dict: {u'LOC': [[u'hong', 6], [u'kong', 6]], u'PER': [], u'ORG': [[u'reuters', 4]]} 2019-07-04 16:04:18 - Chinese state media says 'Western ideologues' to blame for Hong Kong unrest - ReutersChinese state media says 'Western ideologues' to blame for Hong Kong unrest ReutersHong Kong protests: China tells UK not to interfere in 'domestic affairs' BBC Ne… https://t.co/kE2q047xo8 2019-07-04 16:05:18 - Chinese state media says 'Western ideologues' to blame for Hong Kong unrest https://t.co/eH6w2GK6gz https://t.co/yW38xOIh3l 2019-07-04 16:15:11 - Chinese state media says 'Western ideologues' to blame for Hong Kong unrest - Reuters https://t.co/GDmmi3h0HI https://t.co/i8zYaLFfKG cluster_id:111 $tweet_num_without_dup: 2 $all_tweet_num: 5 cluster_entities_dict: {u'LOC': [[u'korea', 5], [u'south', 5]], u'PER': [], u'ORG': [[u'petiti', 2]]} 2019-07-04 16:04:33 - Amend the constitution for the betterment of all animals. End the dog meat trade in South Korea! - Sign the Petition! https://t.co/aOOiCYjlfq via @UKChange 2019-07-04 16:23:50 - Amend the constitution for the betterment of all animals. End the dog meat trade in South Korea! - Firma la petizione! https://t.co/wb1BoXpBdB di @ChangeItalia cluster_id:207 $tweet_num_without_dup: 2 $all_tweet_num: 2 cluster_entities_dict: {u'LOC': [[u'nigeria', 4], [u'africa', 2]], u'PER': [], u'ORG': []} 2019-07-04 16:06:28 - Nigeria raises hope on Africa free trade deal, commits to signing pact.Nigeria, the largest economy on the continent, was one of the last countries that had not committed to signing the deal and its decision.. See more here >> https://t.co/dWcmGOZkrg https://t.co/leeT5hC12b 2019-07-04 16:10:54 - Nigeria raises hope on Africa free trade deal, commits to signing pact.Nigeria, the largest economy on the continent, was one of the last countries that had not committed to signing the deal and its decision.. See more here >> https://t.co/dWcmGOZkrg https://t.co/leeT5hC12b
