简体   繁体   中英

Python alternative to R mutate

I want to convert R code into Python. The code in R is

df %>% mutate(N = if_else(Interval != lead(Interval) | row_number() == n(), criteria/Count, NA_real_)) 

In Python I wrote the following:

import pandas as pd
import numpy as np
df = pd.read_table('Fd.csv', sep=',')

for i in range(1,len(df.Interval)-1):
    x = df.Interval[i]
    n = df.Interval[i+1]
    if x != n | x==df.Interval.tail().all():
        df['new']=(df.criteria/df.Count)
    else:
        df['new']='NaN'
df.to_csv (r'dataframe.csv', index = False, header=True)

However, the output returns all NaNs.

Here is what the data looks like

Interval | Count    |   criteria    
0        0               0                             
0        1               0                            
0        2               0                             
0        3               0                             
1        4               1                             
1        5               2                             
1        6               3                            
1        7               4                             
2        8               1                          
2        9               2       
3        10              3

and this is what I want to get ( I also need to consider the last line)

Interval | Count    |   criteria  |  new

0        0               0                             
0        1               0                            
0        2               0                             
0        3               0       0                      
1        4               1                             
1        5               2                             
1        6               3                            
1        7               4       0.5714                     
2        8               1                          
2        9               2       0.2222 

3        10              3       0.3333

If anyone could help find my mistake, I would greatly appreciate.

1. Start indexing at 0

The first thing to note is that Python starts indexing at 0 (in contrast to R which starts at 1). Therefore, you need to modify the index range of your for-loop.

2. Specify row indices

When calling

df['new']=(df.criteria/df.Count)

or

df['new']='NaN'

you are setting/getting all the values in the "new" column. However, you intend to set the value only in some rows. Therefore, you need to specify the row.

3. Working example

import pandas as pd

df = pd.DataFrame()
df["Interval"] = [0,0,0,0,1,1,1,1,2,2,3]
df["Count"] = [0,1,2,3,4,5,6,7,8,9,10]
df["criteria"] = [0,0,0,0,1,2,3,4,1,2,3]
df["new"] = ["NaN"] * len(df.Interval)

last_row = len(df.Interval) - 1
for row in range(0, len(df.Interval)):
    current_value = df.Interval[row]
    next_value = df.Interval[min(row + 1, last_row)]
    if (current_value != next_value) or (row == last_row): 
        result = df.loc[row, 'criteria'] / df.loc[row, 'Count']
        df.loc[row, 'new'] = result

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM