I am playing around a large dataset which almost 200 columns and 70000 rows. It is such a messy data so I should make more readable.
In the data columns are means: ATT_A(agree)
, ATT_SA(Strongly agree)
, ATT_D(disagree)
and so on
every 5 columns represent only 1 answer
my Idea is, I can use .replace()
function and then make every 1 values column represented value (if the column name .._SA then column values should be 'SA'
instead of 1)
then I can join 5 columns in one column. It will be less messy.
IDEA_COLUMN
SA
A
SD
A
D
SA
Here my code I tried around.
for c in cols.columns:
if c.upper()[:4] == 'ATT_':
if c[-2:] == 'SA':
c.replace('1', 'SA')
I tried many times so many different types but I cannot see my mistakes. I am new on coding so I can have silly mistakes.
Here is one option:
# split the columns at the second underscore to make the columns a multi-index
df.columns = df.columns.str.rsplit("_", n=1, expand=True)
# transform the answer A,SA,D... to a column, group by level 0(row number) and find out the
# answer corresponding to 1 with idxmax
df.stack(level=1).groupby(level=0).agg(lambda x: x.idxmax()[1])
Another option :
# split columns as above
df.columns = df.columns.str.rsplit("_", n=1, expand=True)
# group columns based on the prefix along axis 1, and for each row find out the index with
# value 1 using idxmax() function
df.groupby(level=0, axis=1).apply(lambda g: g.apply(lambda x: x.idxmax()[1], axis = 1))
Data Set Up :
cols1 = ["ATT_TECHIMP_" + x for x in ["SA", "A", "NO", "D", "SD"]]
cols2 = ["ATT_BBB_" + x for x in ["SA", "A", "NO", "D", "SD"]]
df1 = pd.DataFrame([[1, None, None, None, None], [None, None, 1, None, None], [None, None, 1, None, None], [None, None, None, 1, None], [None, None, None, None, 1]], columns=cols1)
df2 = pd.DataFrame([[None, 1, None, None, None], [None, None, None, None, 1], [None, None, 1, None, None], [None, None, None, 1, None], [None, None, None, None, 1]], columns=cols2)
df = pd.concat([df1, df2], axis=1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.