简体   繁体   中英

A pandas dataframe column to pass as an argument of row level function to apply each value of the column to its respective row

![enter image description here][1]

I have a weird requirement to add a column similar to the below oracle query to apply on python dataframe. A pandas dataframe column to pass as an argument of row level function to apply each value of the column to its respective row.

select employee_id, first_name, last_name, substr(first_name, employee_id/employee_id, 3) test1 from hr.EMPLOYEES

The result has been attached in the below csv file. I want to add a new column. So I wrote my pandas statement as below

EMPLOYEE.assign(test1=EMPLOYEE.FIRST_NAME.str.slice(EMPLOYEE.EMPLOYEE_ID.apply(lambda x: x/x),4))

In this code I am using EMPLOYEE.EMPLOYEE_ID inside the slice function of EMPLOYEE.FIRST_NAME column.

my intention is if we think this as a variable

FIRST_NAME="Steven"
EMPLOYEE_ID=100
FIRST_NAME[int(EMPLOYEE_ID/EMPLOYEE_ID):4]

the output of this is 'tev'

If I apply the same concept on dataframe it is not working.

I am not getting the substring, instead I am getting NaN. My Python output is as below

https://i.stack.imgur.com/uZN8x.png

The below is the output from SQL query.

    EMPLOYEE_ID   FIRST_NAME  LAST_NAME TEST1
0           100       Steven       King   Ste
1           101        Neena    Kochhar   Nee
2           102          Lex    De Haan   Lex
3           103    Alexander     Hunold   Ale
4           104        Bruce      Ernst   Bru
5           105        David     Austin   Dav
6           106        Valli  Pataballa   Val
7           107        Diana    Lorentz   Dia
8           108        Nancy  Greenberg   Nan
9           109       Daniel     Faviet   Dan
10          110         John       Chen   Joh
11          111       Ismael    Sciarra   Ism
12          112  Jose Manuel      Urman   Jos
13          113         Luis       Popp   Lui

We can do it by creating a new column, but, as in SQL we can do it directly. I am expecting the same through pandas. Kindly help me on this.

My intended output is the same as above. But, I am writing a generic script that converts SQL query into pandas dataframe. I am stuck in achieving such scenarios where there is a column is being passed as a parameter of a row level function.

Try:

res = df[['EMPLOYEE_ID', 'FIRST_NAME', 'LAST_NAME']].assign(pre=df['FIRST_NAME'].str[:3])
print(res)

Output

    EMPLOYEE_ID   FIRST_NAME  LAST_NAME  pre
0           100       Steven       King  Ste
1           101        Neena    Kochhar  Nee
2           102          Lex    De Haan  Lex
3           103    Alexander     Hunold  Ale
4           104        Bruce      Ernst  Bru
5           105        David     Austin  Dav
6           106        Valli  Pataballa  Val
7           107        Diana    Lorentz  Dia
8           108        Nancy  Greenberg  Nan
9           109       Daniel     Faviet  Dan
10          110         John       Chen  Joh
11          111       Ismael    Sciarra  Ism
12          112  Jose Manuel      Urman  Jos
13          113         Luis       Popp  Lui

To use it with str.slice do:

res = df[['EMPLOYEE_ID', 'FIRST_NAME', 'LAST_NAME']].assign(pre=df['FIRST_NAME'].str.slice(stop=3))

where, stop is:

stop int, optional Stop position for slice operation.

basically iterate one character at a time and stop in position 3 (non inclusive).

df[['EMPLOYEE_ID', 'FIRST_NAME', 'LAST_NAME']].apply(lambda i: i.FIRST_NAME[int(i.EMPLOYEE_ID/i.EMPLOYEE_ID):4], axis=1)

The above is giving me the answer. But it is slower than the other one

%timeit EMPLOYEE.apply(lambda i: i.FIRST_NAME[int(i.EMPLOYEE_ID/i.EMPLOYEE_ID):3], axis=1)

1.38 ms ± 58.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit EMPLOYEE[['EMPLOYEE_ID', 'FIRST_NAME', 'LAST_NAME']].assign(pre=EMPLOYEE['FIRST_NAME'].str[1:4])

948 µs ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM