The data is:
name day1 day2 day3 day4
anshu 1 . 1 1
Yash 1 1 . 1
Natasha 1 1 1 .
_1st_absent_on
: showing which day the person was 1st absent on.
For example, for Anshu
, this variable will take the value 2, and for Natasha this variable will take the value 4.
Any help would be greatly appreciated
import pandas as pd
data = pd.read_csv(filePath,header=None)
data['_1st_absent_on'] = None
for row in range(1, data.shape[0]):
for col in range(data.shape[1]-1):
#print(data[col][row])
if data[col][row] == '.':
print('{} 1st absent on: {}'.format(data[0][row], data[col][0]))
data.iloc[row, data.shape[1]-1] = col+1
break
One way to do it is to iterate over the rows, then use numpy to fetch the rows where they are absent on. Let's say that that absent is a 0
import pandas as pd import numpy as np
d = {'fname': ['anshu', 'arth', 'natasha', 'saurav'], 'day1': [1, 1, 1, 1], 'day2': [0, 1, 1, 1], 'day3': [1, 0, 1, 1], 'day4': [1, 1, 0, 1]}
df = pd.DataFrame(data=d)
for i, row in df.iterrows():
print(row['fname'], np.where(row[1:] == 0))
This will print a list of indexes where the user has been absent.
Disclaimer Im not an expert on pandas so there is probably a better way of doing this but this is what I could think of atop of my head.
as you can can possible have multiple absences per row we can melt the df and take the last day.
df['lastabscence'] = df["name"].map(
pd.melt(df, id_vars="name")
.query('value == "."')
.groupby("name")["variable"].last()
)
name day1 day2 day3 day4 lastabscence
0 anshu 1 . . 1 day3
1 Yash 1 1 . 1 day3
2 Natasha 1 1 1 . day4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.