[英]How to plot frequencies of words vs. time, with time variable grouped into month/year and year in R
I have a document term matrix, with frequencies of >600 words, and a corresponding date (mm/dd/yyyy) for each frequency value: 我有一个文档术语矩阵,其频率大于600个单词,并且每个频率值都有一个对应的日期(mm / dd / yyyy):
> head(mydata3,3)
Claim.Number Note.Date LOSSDATE DATEREPORTED
1 106810 7/10/1998 12/9/1997 12/29/1997
2 106810 7/21/1998 12/9/1997 12/29/1997
3 106810 10/21/1999 12/9/1997 12/29/1997
DATEENTERED Row Topic absenc abus academ access
1 1/5/1998 3 4 0 0 0 0
2 1/5/1998 4 2 0 0 0 0
3 1/5/1998 8 11 0 0 0 0
accid accommod account accus act action activ add
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
addit addl adequ adjust administr admiss advanc
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
advers advic african age agenc agreement aid ambul
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
amount analysi ankl answer anticip appeal appel
1 0 0 0 0 0 0 0
2 0 0 0 0 0 2 0
3 0 0 0 0 0 1 0
appli applic appoint appropri approv approxim arbitr
1 0 0 0 1 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
argu argument aris arm arrang arriv asap assault
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 1 0 0 0 0 0
assert assess assist athlet attach attent audit auto
1 0 0 0 0 0 2 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
avoid await award background balanc ball bar basi
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
benefit big bill black board breach break. brief
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
broken broker budget build bus busi call campus cap
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 2 0 0
3 0 0 0 0 0 0 0 0 0
car care carrier center cgl chair chang charg child
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
children circuit cite citi civil clean client clinic
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
close closur cmc coach code collect commit committe
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
communic compani compar compel compens complain
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
complet conclud condit conduct conf confer confid
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
conflict connect construct consult contact contend
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
contract contractor contribut control convers
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
convinc cooper coordin copi correct cost counter
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 1 0 0 0
counti cours court cover coverag creat credibl
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
credit crimin cross cut damag danger deadlin deal
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
dean death decis declin deduct defam defect defend
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
degre delay demand deni denial depart depos deposit
1 0 0 0 0 0 0 0 0
2 1 0 0 1 0 0 0 0
3 1 0 0 0 0 0 0 0
dept despit develop diari difficult director disabl
1 0 1 0 1 0 0 0
2 1 0 0 0 0 0 0
3 0 0 0 0 0 0 0
discharg disciplin disciplinari discoveri discrimin
1 0 0 0 0 1
2 0 0 0 0 1
3 0 0 0 0 0
discuss dismiss disput distress district doc docket
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
doctor document done door dorm doubt draft drive
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 1 0
3 0 0 0 0 0 0 0 0
driver drop due earlier earn educ eeoc effort ell
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
els email emot employ employe encourag end endors
1 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 0
3 0 0 0 1 2 0 1 0
enrol entitl environ estim evalu event evid exam
1 0 0 0 0 0 0 0 2
2 0 0 0 0 0 0 0 2
3 0 0 0 0 0 0 0 2
examin exceed excess exchang exclus execut expens
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
experi expert expir exposur extend extens extent
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
extrem eye face facil faculti fail failur fall fals
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 1 2 1 0 0
3 0 0 0 0 0 3 0 0 0
fault favor fax feder fee fell femal field fight
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
final financi finish fire firm floor focus foot forc
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
form formal former forward fractur free fund futur
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
game gender gone grade graduat grant grievanc ground
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 1 0 0
3 0 0 0 1 1 0 0 0
group hand happi harass head health hear held higher
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
hire histori hit hold home hospit hostil hous human
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
ice identifi immedi immun impact import impress
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
improv inappropri inclin incur indemn individu injur
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
injuri inquir inquiri inspect instruct intent
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
interest intern invoic job joint judg judgment juri
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 1 0 0 0 0 0 0
jurisdict key knee knowledg lacer lack larg latest
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
law lawyer layer learn leav leg legal letter level
1 0 0 0 0 0 0 0 1 0
2 0 1 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
liabil lien life limit litig live lmtcb local lose
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
loss lost low mail mainten major male manag materi
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
mcad med mediat medic medicar meet memo merit messag
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 2 0 0 0
million minor mom money monitor motion msj mtd
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
nation near neck neglig negoti news noth notic
1 1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
notifi numer nurs object oblig ocr offer offici ongo
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 2 0 0 0
open oper opinion opportun oppos opposit oral order
1 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
origin outlin outstand owe paid pain park parti
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
partner pass pay payment pend perman personnel petit
1 0 1 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0
3 0 2 0 0 1 0 0 0
phone photo physic physician pictur plan player
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
plead poa polic polici poor postpon potenti practic
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
preliminari premis prepar pres presid press pressur
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
prevail prevent primari privat proceed product
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
profession professor progress project promis promot
1 0 0 0 0 0 0
2 0 1 0 0 0 0
3 0 2 0 0 0 0
proper properti propos protect provis provost pull
1 0 0 0 0 0 0 0
2 0 0 0 0 0 1 0
3 0 0 0 0 0 0 0
punit pursu push qualifi quick quiet quit race rais
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
rang rate reach recal receipt recov recoveri rediari
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
reduc reimburs reinsur reject relationship releas
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
relief remain remedi remov renew reopen rep repair
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 1 0 0 0 0 0
repeat. replac repli repres represent research
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
reserv resid resign resolut resolv respect respond
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
rest retain retali retent retir return reveal review
1 0 0 0 0 0 0 0 2
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 1
revis risk role ror rts rule run safeti salari
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
schedul search section secur select semest separ
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
serious serv servic settl settlement sex sexual
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
shoulder side sidewalk sign signific sir sit site
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
situat slip small snow speak spent split staff stage
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
stair standard statement status statut step stop
1 0 0 0 0 0 0 0
2 0 0 0 2 0 0 0
3 0 0 0 0 0 0 0
stori strategi street strike struck studi subject
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
substanti success sue suffer suffici suggest summari
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
supervis supervisor supplement supv surgeri suspect
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
suspend sustain system tabl tcw teach teacher team
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
telephon tender tenur term termin test testifi
1 0 0 0 0 0 0 0
2 0 0 0 0 0 1 0
3 0 0 0 0 0 0 0
testimoni theori threaten titl top total tpa track
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
train transcript transfer transport travel treat
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
treatment trial trip troubl tuition unabl unclear
1 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
unfortun upcom updat vacat valu vehicl verdict video
1 0 0 1 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
violat visitor voicemail wage wait walk warn watch
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
water weak white win withdraw worker write written
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 1 0
wrote xbocx xdolx ximex xmsjx xnpcx xoopx xprosex
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0
xsolx
1 0
2 0
3 0
I am trying to group the frequency values by month/year and year. 我正在尝试按月/年和年对频率值进行分组。 For example, for the word "appeal", instead of having 2 occurrences on 1/5/1998, and another occurrence on 1/5/1998, I would like to have 3 occurrences for 1/1998, and then also 3 occurrences (assuming there aren't any more hits for the rest of the year) for 1998. Then I would like to plot the frequency per month/year vs. month/year, and the frequency per year vs. year.
例如,对于“ appeal”一词,我希望在1/1998中出现3次,而不是在1/5/1998年出现2次,而在1/5/1998年出现另一次,假设1998年剩余时间没有更多点击),那么我想绘制每月/每年与每月/每年的频率,以及每年与每年的频率。
I tried using the following code to group by month/year: 我尝试使用以下代码按月/年分组:
df %>%
mutate(month_year = format(date, "%Y/%m")) %>%
group_by(month_year) %>%
summarise(total = sum(vocabfreq))
where value are all of the columns with the frequency of words in the original data set. 其中value是原始数据集中单词频率的所有列。 Another problem is that my data set is quite large, and I am having difficulty plotting multiple series on one graph that shows distinctive features.
另一个问题是我的数据集很大,并且在显示具有鲜明特征的一张图上绘制多个序列时遇到困难。
The xts
method: xts
方法:
library(xts)
dat <- data.frame(date=c('7/10/2014', '7/10/2014', '7/11/2014', '8/05/2015', '9/21/2015'),
word1= c(1,2,1, 4, 3), word2=c(3, 10, 1, 2, 4))
dates <- as.POSIXct(dat$date, format='%m/%d/%Y')
dat.xts <- xts(subset(dat, select= -date), order.by=dates)
apply.daily(dat.xts, colSums)
apply.monthly(dat.xts, colSums)
You should use summarise_each
instead of summarise
. 您应该使用
summarise_each
而不是summarise
。 Btw, I'm using @DunderChief's code to generate the data. 顺便说一句,我正在使用@DunderChief的代码来生成数据。 Thank you for that.
谢谢你
dat <- data.frame(date=c('7/10/2014', '7/10/2014', '7/11/2014', '8/05/2015', '9/21/2015'),
word1= c(1,2,1, 4, 3), word2=c(3, 10, 1, 2, 4))
library(dplyr)
dat %>%
mutate(date = as.Date(date, format='%m/%d/%Y')) %>%
group_by(date) %>%
summarise_each(funs(sum(.)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.