简体   繁体   中英

Python pandas, multiplication of specific series of data

imagine that I open 2 .csv files to make 2 arrays containing different types of data related to different types of objects.

One is a list of objects :

object_type  measurement  name    serialNumber
cat          6.3          bill    1
cat          7.1          kitty   1
whale        25678        none    1
dog          11.1         none    1
dolphin      200.8        none    1
cat          6.1          bill    2
cat          7            kitty   2
whale        25121        none    2
dog          12.1         none    2
dolphin      200          none    2

The other one tells me the percentage of water in the body of several animals:

object-type  H2O_percent
dog          66
cat          66
whale        75
dolphin      75
jellyfish    98

my function will multiply measurement by H2O_percent in function of object-type

Let's have this code first:

import pandas as pd

object_list = pd.read_csv('animals.csv', names=['object_type', 'measurement', 'name', 'serialNumber'])
percentages = pd.read_csv('H2O_percentage.csv', names=['wavelength', 'a', 'b'])

What is the preferred syntax to discriminate the objects in function of their type?

In other words, how to translate this pseudocode :

for all cats, do measurement * H20_percent as stated in file/list 'H2O_percentage.csv'

EDIT:

2nd question: the serial_number is here to tell me "1 means the first measurement, 2 the seconde measurement, etc."

How can I compute separately all the individual measurements (imagine there are hundreds of 'em...) ?

Thanks

Try this:

res =  pd.merge(object_list,percentages,left_on='object_type',right_on='object-type')
res['water'] = res['measurement'] * res['H2O_percent'] 

Not sure of what you want in your second question, but you can try this and see if it helps:

for i,g in res.groupby(['object_type','name']):
    print "="*80
    print g
    print "="*80

In regards to the second question: Are you trying to apply two different types of equations based on the value in serialNumber?

After the merge between the object_list and percentages, you could "query" the dataframe based on the value in serialNumber and apply the correct formula;

# object_list columns -> ['object-type','measurement','name','serialNumber']
# percentages columns -> ['object-type','H2O_percent']

# Merge the two dataframe on object-type and save the result as res
res =  pd.merge(object_list,percentages,how='inner',on=['object_type'])
# res columns -> ['object-type','measurement','name','serialNumber','H2O_percent']

# Create a new column for the results and default it to 0.0
res['water'] = 0.0
# For all rows that have serialNumber equal to 1 -- do calculations
res['water'][res['serialNumber'] == 1] = \
                    res['measurement'][res['serialNumber'] == 1] * \ 
                    res['H2O_percent'][res['serialNumber'] == 1]
# For all rows that have serialNumber equal to 2 -- do calculations
res['water'][res['serialNumber'] == 2] = \
                    res['measurement'][res['serialNumber'] == 2] * \ 
                    res['H2O_percent'][res['serialNumber'] == 2]

Here the res[res['serialNumber'] == 1] will allow you to only select row indexes where serialNumber == 1. Now with this idea you can do a separate calculation based on the value in serialNumber. If there will be different columns for "measurement_1" and "measurement_2", you can simply change the column name to be multiplied by.

Also, if you will be applying the same calculation but only changing the measurement column based on the serial number, and your columns names in object_list are like:

['object-type','measurement_1','measurement_2','name','serialNumber']

Where the serial number corresponds to the measurement column, then you could also do something like this:

res['water'] = res.apply(axis=1, func=lambda x: x["measurement_%i"%(x['serialNumber'])] * x["H2O_percent"])

The apply function is similar to pythons builtin "map". You can 'apply' the same function over the rows or columns (where axis=1 is for row-wise [top to bottom] where the indexes will be the column names, and axis=0 is column-wise [left to right] where the row indexes are the indexes)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM