简体   繁体   中英

Divide column 2 with a particular value that is in the heading

Hello everyone, First of all I am a newbie to coding and I am learning now. So, please excuse me for my doubt!

My data is as follows:

TOPIC:  1 87187.0

Mr 2288.0
's 1633.0
@card@ 1132.0
party 731.0
say 710.0

TOPIC:  2 97854.0

say 2170.0
@card@ 1872.0
people 1078.0
police 562.0

and so on.... till Topic 100 with the same format.

Here the first line is the topic number and it's weight. The following are the words in that topics and their weights in that topic.

I want to find the probability of each of the word. That is divide each of the word's weight with it's respective topic weight. For example,

In topic 1, the word Mr weight is 2288.0 and it's topic weight is 87187.0. So, the probability of the word Mr in Topic 0 is 2288.0/87187.0. Likewise I would like to know the probability of all the words. 

My output should be like:

TOPIC:  1 87187.0

Mr 0.02624 
's 0.01872
@card@ 0.0129

and so on... where these values are the results of the word's weight/the topic weight.

If it is a normal column division then, I would have used col2/col1 technique but this quite challenging. So, please guide me. Thanks in advance!

You don't say anything at all about what you want your output format to look like, or even give an example of such, but this should at least point you in the right direction...

Suggested python starting point, which is what your edit seems to indicate is your desired output, aside from floating point rounding concerns:

divisor = 1.0
with open("input.txt") as fd:
    for line in fd:
        fields = line.strip().split()
        if len(fields) > 0:
            if fields[0] == 'TOPIC:':
                divisor = float(fields[-1])
            if len(fields) == 2:
                fields[-1] = str(float(fields[-1]) / divisor)
        print ' '.join(fields)

With your above sample input, this code produces:

TOPIC: 1 87187.0

Mr 0.0262424444011
's 0.0187298565153
@card@ 0.0129835870026
party 0.00838427747256
say 0.00814341587622

TOPIC: 2 97854.0

say 0.0221758947003
@card@ 0.0191305414188
people 0.0110164122059
police 0.00574325014818

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM