简体   繁体   English

将第2栏除以标题中的特定值

[英]Divide column 2 with a particular value that is in the heading

Hello everyone, First of all I am a newbie to coding and I am learning now. 大家好,首先我是编码的新手,现在正在学习。 So, please excuse me for my doubt! 所以,请原谅我的疑问!

My data is as follows: 我的数据如下:

TOPIC:  1 87187.0

Mr 2288.0
's 1633.0
@card@ 1132.0
party 731.0
say 710.0

TOPIC:  2 97854.0

say 2170.0
@card@ 1872.0
people 1078.0
police 562.0

and so on.... till Topic 100 with the same format. 依此类推。...直到主题100的格式相同。

Here the first line is the topic number and it's weight. 第一行是主题编号及其权重。 The following are the words in that topics and their weights in that topic. 以下是该主题中的单词及其在该主题中的权重。

I want to find the probability of each of the word. 我想找到每个单词的概率。 That is divide each of the word's weight with it's respective topic weight. 那就是将每个单词的权重除以其各自的主题权重。 For example, 例如,

In topic 1, the word Mr weight is 2288.0 and it's topic weight is 87187.0. So, the probability of the word Mr in Topic 0 is 2288.0/87187.0. Likewise I would like to know the probability of all the words. 

My output should be like:

TOPIC:  1 87187.0

Mr 0.02624 
's 0.01872
@card@ 0.0129

and so on... where these values are the results of the word's weight/the topic weight. 等等...这些值是单词权重/主题权重的结果。

If it is a normal column division then, I would have used col2/col1 technique but this quite challenging. 如果这是正常的列划分,那么我将使用col2 / col1技术,但这颇具挑战性。 So, please guide me. 所以,请指导我。 Thanks in advance! 提前致谢!

You don't say anything at all about what you want your output format to look like, or even give an example of such, but this should at least point you in the right direction... 您什么都不想说出输出格式,甚至没有举任何例子,但这至少应该为您指明正确的方向...

Suggested python starting point, which is what your edit seems to indicate is your desired output, aside from floating point rounding concerns: 建议的python起点,除了浮点舍入问题之外,您的编辑似乎表明这是您想要的输出:

divisor = 1.0
with open("input.txt") as fd:
    for line in fd:
        fields = line.strip().split()
        if len(fields) > 0:
            if fields[0] == 'TOPIC:':
                divisor = float(fields[-1])
            if len(fields) == 2:
                fields[-1] = str(float(fields[-1]) / divisor)
        print ' '.join(fields)

With your above sample input, this code produces: 使用上面的示例输入,此代码将产生:

TOPIC: 1 87187.0

Mr 0.0262424444011
's 0.0187298565153
@card@ 0.0129835870026
party 0.00838427747256
say 0.00814341587622

TOPIC: 2 97854.0

say 0.0221758947003
@card@ 0.0191305414188
people 0.0110164122059
police 0.00574325014818

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM