I'm struggling to solve this issue. Help would be very much appreciated.
Note: bold in the text refers to the columns i need to create.
I have a data set in which I count the values of the row that are different than nan, and it's represented in column [count]. In column [incl_count] i would like to have lists which identify the headings of the columns contributing to the count. Next, I would like to have a limitation [lim] column in which I cannot have more than 3 counts. There is a cap of maximum 3. This means that the last columns to arrive to the counting cannot be considering and therefore excluded, being the exclusion saved in column [excl]
[index] [A] [B] [C] [D] [E] [F] [count] [incl_count] [lim] [excl]
...
...
...
2020-01-01 nan nan nan nan nan nan 0 [] 0 []
2020-01-02 -0.01 nan nan nan nan nan 1 [A] 1 []
2020-01-03 0.02 nan nan nan nan nan 1 [A] 1 []
2020-01-04 -0.01 0.01 nan nan nan nan 2 [A,B] 2 []
2020-01-05 -0.02 -0.04 0.02 nan nan nan 3 [A,B,C] 3 []
2020-01-06 nan 0.02 0.03 0.02 0.01 nan 4 [B,C,D,E] 3 [E]
2020-01-07 nan -0.02 0.01 -0.01 0.03 0.01 5 [B,C,D,E,F] 3 [E,F]
2020-01-08 nan nan -0.02 0.05 -0.05 0.02 4 [C,D,E,F] 2 [E,F]
2020-01-09 nan nan nan 0.02 0.02 0.05 3 [D,E,F] 1 [E,F]
2020-01-10 nan nan nan nan nan 0.01 1 [F] 0 [F]
...
...
...
This should work:
import pandas as pd
import numpy as np
non_value_columns = ["index", "incl_count", "excl", "lim", "count"]
max_lim = 3
entries = []
df = pd.read_excel('your.xlsx')
for entry in df:
if entry not in non_value_columns:
print(entry)
entries.append(entry)
indexes = df['index'].tolist()
i = 0
cur_excludes = []
for index in indexes:
c = 0
incl = []
excl = []
for entry in entries:
if not np.isnan(df[entry].tolist()[i]):
incl.append(entry)
c += 1
if max_lim < c or entry in cur_excludes:
c -= 1
excl.append(entry)
cur_excludes.append(entry)
df.loc[i, 'lim'] = str(c)
df.loc[i, 'incl_count'] = str(incl)
df.loc[i, 'excl'] = str(excl)
i += 1
df.to_excel('output.xlsx')
Edit: Changed code so it would loop through all the different columns. Made an array where you can state the columns that are nonvalue columns, make sure you extend it if you add columns that you do not want to check it is name-based so just add the name of the column. Also made a variable where you can state your limit. Hope this works tell me if anything goes wrong!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.