[英]How to change Topic list (from gensim lda get_document_topics()) to a DataFrame format
I have performed some topic modelling using gensim.models.ldamodel.LdaModel() and I want to label my data, to visualize my findings.我已经使用gensim.models.ldamodel.LdaModel()执行了一些主题建模,我想 label 我的数据,以可视化我的发现。
This is what I have so far:这是我到目前为止所拥有的:
My current dataframe has the following columns:我当前的 dataframe 具有以下列:
['text']['date']['gender']['tokens']['topics']['main_topic']
Text is just the pure textdata, date has the form (yyyy-mm-dd), gender is binary with female being 1, tokens is the text after preprocessing, topics is derived from:文本就是纯文本数据,日期格式为(yyyy-mm-dd),性别为二进制,女性为1,tokens为预处理后的文本,topics来源于:
df['topics'] = LDA_model.get_document_topics(corpus)
and main_topic is a little change from the second answer from this post and is populated like this:并且 main_topic 与这篇文章的第二个答案略有不同,并且填充如下:
df['main_topic'] = [int(str(sorted(LDA_model[i],reverse=True,key=lambda x: x[1])[0][0]).zfill(3)) for i in corpus]
Finally, the first 10 rows of topics and main_topics look like this (notice that num_topics=30):最后,前 10 行 topic 和 main_topics 看起来像这样(注意 num_topics=30):
topics main_topic
[(0, 0.051341455), (1, 0.21204428), (2, 0.1145254), (4, 0.055585753), (11, 0.20260869), (29, 0.25616828)] 29
[(0, 0.052005265), (1, 0.21128647), (2, 0.08015486), (3, 0.11465485), (29, 0.4478401)] 29
[(0, 0.05355798), (1, 0.1394092), (2, 0.10734849), (4, 0.32699445), (29, 0.273105)] 4
[(0, 0.053568278), (1, 0.22299954), (2, 0.22616898), (11, 0.0959242), (29, 0.2897638)] 29
[(0, 0.05404401), (1, 0.4482777), (4, 0.141311), (29, 0.24849494)] 1
[(0, 0.054245334), (1, 0.18933308), (2, 0.14567153), (4, 0.11169399), (23, 0.05768766), (29, 0.35825193)] 29
[(0, 0.05449035), (2, 0.114870586), (4, 0.13284092), (11, 0.075592585), (23, 0.13247918), (24, 0.06598773), (29, 0.32016253)] 29
[(0, 0.055871632), (1, 0.23100668), (4, 0.06832383), (29, 0.4730603)] 29
[(0, 0.057746172), (1, 0.057121024), (2, 0.07247137), (3, 0.26388222), (13, 0.07291462), (29, 0.34331965)] 29
[(0, 0.057841185), (1, 0.19891246), (2, 0.09586754), (29, 0.5344914)] 29
Now what I want is:现在我想要的是:
I want 30 new columns: "topic 0, topic 1, topic 2,..., topic 29".我想要 30 个新列:“主题 0、主题 1、主题 2、...、主题 29”。 And for the first row I want to use df['topics'] and save the values in the new columns so that:
对于第一行,我想使用 df['topics'] 并将值保存在新列中,以便:
topic 0 in row 1 = 0.0513414, topic 1 in row 1 = 0.21204, topic 2 in row 1 = 0.11452 and topic 3 in row 1 = 0, and so on.第 1 行中的主题 0 = 0.0513414,第 1 行中的主题 1 = 0.21204,第 1 行中的主题 2 = 0.11452,第 1 行中的主题 3 = 0,依此类推。
But I dont know how.但我不知道怎么做。 Can someone help?
有人可以帮忙吗?
I figured it out.我想到了。 If someone is looking to achieve the same thing:
如果有人想要实现同样的目标:
LDA_model = gensim.models.ldamodel.LdaModel()
dir(gensim.models.ldamodel.LdaModel)
df['topics'] = LDA_model.get_document_topics(corpus)
sf = pd.DataFrame(data=df['topics'])
af = pd.DataFrame()
for i in range(30):
af[str(i)]=[]
frames = [sf,af]
af = pd.concat(frames).fillna(0)
for i in range(6301):
for j in range(len(df['topics'][i])):
af[str(df['topics'][i][j][0])].loc[i] = df['topics'][i][j][1]
( notice that 30 is my num_topics and 6301 is my number of rows in df['topics'] ) (注意30是我的num_topics和6301是我在 df['topics'] 中的行数)
Now the dataframe af is looking like this [ restrained to 5 rows & 5 columns ]:现在 dataframe af看起来像这样 [限制为 5 行和 5 列]:
topics 0 1 2 3
0 [(1, 0.055395175), (5, 0.0647138), (7, 0.13507782), (9, 0.055264555), (13, 0.19258575), (21, 0.05181323), (27, 0.07139948)] 0.0 0.05539517477154732 0.0 0.0
1 [(0, 0.052290276), (6, 0.064590134), (13, 0.24019116), (16, 0.07827738), (27, 0.0994899)] 0.05229027569293976 0.0 0.0 0.0
2 [(6, 0.054943837), (7, 0.07324204), (10, 0.052613333), (12, 0.12482096), (27, 0.19818054), (29, 0.06280263)] 0.0 0.0 0.0 0.0
3 [(4, 0.12759669), (8, 0.06937062), (10, 0.2261674), (16, 0.066699274), (24, 0.06150386), (27, 0.096883684)] 0.0 0.0 0.0 0.0
4 [(2, 0.09043305), (8, 0.15643781), (10, 0.13145259), (16, 0.064689845), (17, 0.05019963), (24, 0.09253424), (28, 0.10176642)] 0.0 0.0 0.09043305367231369 0.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.