简体   繁体   English

在J中创建频率矩阵(计算数组中的项目数)

[英]Creating Frequency Matrix (counting number of items in array) in J

>text
┌───────────┬──────────┬───────────┬──────────┬──────────┬─────────┬──────────┬─────────────┬─────────────┬──────────┬───────────────┬──────────┬──────────┬────────────┬─────────────────┬──────────┬──────────┬──────────────┬─────────────┬─────────────┬────...
│speak      │conceal   │terribl    │option    │write     │book     │come      │tuesdai      │matter       │act       │conceal        │catastroph│integr    │depart      │justic           │put       │wai       │choic         │realli       │bad          │opti...
├───────────┼──────────┼───────────┼──────────┼──────────┼─────────┼──────────┼─────────────┼─────────────┼──────────┼───────────────┼──────────┼──────────┼────────────┼─────────────────┼──────────┼──────────┼──────────────┼─────────────┼─────────────┼────...
│trump      │logu      │talk       │entir     │time      │talk     │entir     │time         │discov       │someth    │frequent       │doe       │logu      │thi         │direct           │logu      │direct    │logu          │differ       │direct       │cons...
├───────────┼──────────┼───────────┼──────────┼──────────┼─────────┼──────────┼─────────────┼─────────────┼──────────┼───────────────┼──────────┼──────────┼────────────┼─────────────────┼──────────┼──────────┼──────────────┼─────────────┼─────────────┼────...
│cohen      │lawyer    │object     │taint     │team      │anoth    │unusu     │move         │lawyer       │trump     │file           │emerg     │motion    │court       │sundai           │night     │sai       │presid        │object       │extraordinari│meas...
├───────────┼──────────┼───────────┼──────────┼──────────┼─────────┼──────────┼─────────────┼─────────────┼──────────┼───────────────┼──────────┼──────────┼────────────┼─────────────────┼──────────┼──────────┼──────────────┼─────────────┼─────────────┼────...
│photo      │presid    │trump      │fire      │jame      │comei    │director  │mai          │did          │mean      │end            │comei     │time      │public      │memoir           │higher    │loyalti   │releas        │comei        │featur       │wide...
├───────────┼──────────┼───────────┼──────────┼──────────┼─────────┼──────────┼─────────────┼─────────────┼──────────┼───────────────┼──────────┼──────────┼────────────┼─────────────────┼──────────┼──────────┼──────────────┼─────────────┼─────────────┼────...
│british    │deleg     │organ      │wrote     │twitter   │russia   │syria     │allow        │access       │douma     │unfett         │access    │essenti   │russia      │syria            │cooper    │western   │diplomat      │confirm      │syria        │russ...
├───────────┼──────────┼───────────┼──────────┼──────────┼─────────┼

cleaned_text

    ┌─────┬───────┬───────┬──────┬─────┬────┬────┬───────┬──────┬───┬───────┬──────────┬──────┬──────┬──────┬───┬───┬─────┬──────┬───┬──────┬──────────┬──────┬────┬────┬────┬────────┬─────┬─────┬───────┬───────┬───────┬───────┬───┬─────┬───────┬────┬───────┬──...
    │speak│conceal│terribl│option│write│book│come│tuesdai│matter│act│conceal│catastroph│integr│depart│justic│put│wai│choic│realli│bad│option│catastroph│option│hard│call│tell│congress│thing│chang│clinton│fervent│support│disagre│sai│least│philipp│rein│longtim│tr...
    └─────┴───────┴───────┴──────┴─────┴────┴────┴───────┴──────┴───┴───────┴──────────┴──────┴──────┴──────┴───┴───┴─────┴──────┴───┴──────┴──────────┴──────┴────┴────┴────┴────────┴─────┴─────┴───────┴───────┴───────┴───────┴───┴─────┴───────┴────┴───────┴──...

each row of "text" is a news article, and I am trying to figure out the number of each vocab from cleaned_text in each article so that I can create a frequency matrix like this: “text”的每一行都是一篇新闻文章,我试图从每篇文章中的cleaning_text中找出每个词汇的数量,这样我就可以创建一个像这样的频率矩阵:

    art1 art2 art3 ...
mai 4    5    4 
sai 1    0    0
...

I am looking e. 我在寻找e。 and E. verbs to count the number of each vocab in each article, but I am having a hard time to use them in this case. 和E.动词计算每篇文章中每个词汇的数量,但在这种情况下我很难使用它们。

Can anyone help me on this issue??? 任何人都可以帮我解决这个问题吗? Thank you! 谢谢!

I would use a slightly different approach. 我会用一种稍微不同的方法。 To keep things simple, I will use the example of p 为了简单起见,我将使用p的例子

   p
┌─────┬─────┬─────┬─────┬─────┐
│pants│shirt│shirt│hat  │pants│
├─────┼─────┼─────┼─────┼─────┤
│shoes│shoes│socks│pants│shirt│
├─────┼─────┼─────┼─────┼─────┤
│shirt│hat  │pants│shoes│shoes│
├─────┼─────┼─────┼─────┼─────┤
│socks│pants│shirt│shirt│hat  │
├─────┼─────┼─────┼─────┼─────┤
│pants│shoes│shoes│socks│pants│
├─────┼─────┼─────┼─────┼─────┤
│shirt│shirt│hat  │pants│shoes│
└─────┴─────┴─────┴─────┴─────┘

To get a count of each article of clothing I need to compare each row to the whole vocabulary. 为了计算每件衣服的数量,我需要将每一行与整个词汇进行比较。 I get the whole vocabulary by ravelling ( , ) p and getting the nub ( ~. ) This ensures that every possible word in p is accounted for. 我通过ravelling( , )p获得整个词汇表并获得结点( ~. )这确保了p中的每个可能的单词都被考虑在内。

   ~.@:,p
┌─────┬─────┬───┬─────┬─────┐
│pants│shirt│hat│shoes│socks│
└─────┴─────┴───┴─────┴─────┘

Now I will transpose ( |: ) p so that the I can compare each row to the nub using =/ and finish off with totalling up the sum across each item. 现在我将转置( |: )p,以便我可以使用=/将每一行与结点进行比较,并结束每个项目的总和。 +/@:

   +/@:(|: =/ ~.@,)p
2 2 1 0 0
1 1 0 2 1
1 1 1 2 0
1 2 1 0 1
2 0 0 2 1
1 2 1 1 0

Reading these numbers against the nub I see the first row has 2-pants 2-shirts 1-hat 0-shoes and 0-socks and by inspection this is correct. 阅读这些数字反对小块我看到第一排有2件裤子2件式1帽0鞋和0袜子,通过检查这是正确的。 The second row has 1-pant 1-shirt 0-hats 2-shoes and 1-sock and so on... 第二排有1-pant 1-shirt 0-hats 2-shoes and 1-sock等......

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM