更改每个特定列的列值

Question

I am playing around a large dataset which almost 200 columns and 70000 rows. 我正在围绕着将近200列和70000行的大型数据集进行游戏。 It is such a messy data so I should make more readable. 这是一个混乱的数据，所以我应该使它更具可读性。

In the data columns are means: ATT_A(agree) , ATT_SA(Strongly agree) , ATT_D(disagree) and so on 数据列中的平均值是： ATT_A(agree) ， ATT_SA(Strongly agree) ， ATT_D(disagree)等等

every 5 columns represent only 1 answer 每5列仅代表1个答案

my Idea is, I can use .replace() function and then make every 1 values column represented value (if the column name .._SA then column values should be 'SA' instead of 1) 我的想法是，我可以使用.replace()函数，然后使每1个值列表示一个值（如果列名.._ SA，则列值应为'SA'而不是1）

then I can join 5 columns in one column. 然后我可以将5列合并为一列。 It will be less messy. 它将减少混乱。

IDEA_COLUMN IDEA_COLUMN

SA
A
SD
A
D
SA

Here my code I tried around. 在这里，我尝试了我的代码。

for c in cols.columns:
    if c.upper()[:4] == 'ATT_':
        if c[-2:] == 'SA':
             c.replace('1', 'SA')

I tried many times so many different types but I cannot see my mistakes. 我尝试了很多不同类型的很多次，但是我看不到自己的错误。 I am new on coding so I can have silly mistakes. 我是编码新手，所以我会犯一些愚蠢的错误。

Answer 1

Here is one option: 这是一个选择：

# split the columns at the second underscore to make the columns a multi-index
df.columns = df.columns.str.rsplit("_", n=1, expand=True)    

# transform the answer A,SA,D... to a column, group by level 0(row number) and find out the
# answer corresponding to 1 with idxmax
df.stack(level=1).groupby(level=0).agg(lambda x: x.idxmax()[1])

Another option : 另一种选择 ：

# split columns as above
df.columns = df.columns.str.rsplit("_", n=1, expand=True)    

# group columns based on the prefix along axis 1, and for each row find out the index with 
# value 1 using idxmax() function
df.groupby(level=0, axis=1).apply(lambda g: g.apply(lambda x: x.idxmax()[1], axis = 1))

Data Set Up : 数据设置 ：

cols1 = ["ATT_TECHIMP_" + x for x in ["SA", "A", "NO", "D", "SD"]]
cols2 = ["ATT_BBB_" + x for x in ["SA", "A", "NO", "D", "SD"]]

df1 = pd.DataFrame([[1, None, None, None, None], [None, None, 1, None, None], [None, None, 1, None, None], [None, None, None, 1, None], [None, None, None, None, 1]], columns=cols1)
df2 = pd.DataFrame([[None, 1, None, None, None], [None, None, None, None, 1], [None, None, 1, None, None], [None, None, None, 1, None], [None, None, None, None, 1]], columns=cols2)

df = pd.concat([df1, df2], axis=1)

更改每个特定列的列值

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-03-12 06:38:30

更改每个特定列的列值

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-03-12 06:38:30

解决方案1
3 已采纳 2017-03-12 06:38:30