dataframe 中的 For 循环，并将每次迭代保留在 Python 中

Question

I'm creating a template to process SurveyMonkey surveys into a Tableau ready format.我正在创建一个模板来将 SurveyMonkey 调查处理为 Tableau 就绪格式。 I'm breaking down the surveys into their question types.我将调查分解为他们的问题类型。 I want to automate the script as much as possible so I'm trying to use a for loop for each question type.我想尽可能地自动化脚本，所以我尝试为每种问题类型使用一个 for 循环。

For our purposes let's stick to the Ranking type question.出于我们的目的，让我们坚持排名类型的问题。

Let's say I have a dataframe like this:假设我有一个像这样的 dataframe：

d = {'Respondent ID': [123, 234, 345], 'rank question 1': [3, 5, 4], 'rank question 2': [1, 6, 7]}
df = pd.DataFrame(data=d)
df

I want the final dataframe to look like this:我希望最终的 dataframe 看起来像这样：

rankfinal = {'Respondent ID': [123, 234, 345, 123, 234, 345], 'answer': [3, 5, 4, 1, 6, 7], 'question': ['rank question 1', 'rank question 1', 'rank question 1', 'rank question 2', 'rank question 2', 'rank question 2']}
rank1 = pd.DataFrame(data=rankfinal)
rank1

I've tried several attempts, but here is my best:我已经尝试了几次，但这是我最好的：

ranking = [1,2] # These are the column positions in the original survey dataframe

hold = [] 
for i in range(len(ranking)):
    hold.append(i)

respondent_id = []
questions = []
answers = []

for i in hold:
    if len(hold) < 1:
        print('No Ranking Questions! Moving on...')
    else:
        respondent_id.append(Respondent_ID)
        questions.append(df.columns[ranking[i]])
        answers.append(df.iloc[1:, ranking[i]])

While the code works, I don't think I can end up doing anything with the outputs to get them into a single dataframe.虽然代码有效，但我认为我最终无法对输出做任何事情以将它们放入单个 dataframe。 I've always struggled with loops so hopefully you might be able to help me get this project done.我一直在与循环作斗争，所以希望你能帮助我完成这个项目。

Thanks in advance.提前致谢。

Answer 1

I would approach this problem by consolidating rank questions.我会通过合并排名问题来解决这个问题。

Loop through all "rank question" columns, and consolidate their values.遍历所有“排名问题”列，并巩固它们的值。
- you will end up with a list [3, 5, 4, 1, 6, 7]你最终会得到一个列表 [3, 5, 4, 1, 6, 7]
Duplicate Respondent ID field n times where n == num of "rank question" columns.重复受访者 ID 字段 n 次，其中 n == “排名问题”列的数量。
- you will obtain a list [123, 234, 345, 123, 234, 345]您将获得一个列表 [123, 234, 345, 123, 234, 345]
Create a list where you repeat "rank question" field names by the number of rows.创建一个列表，在其中按行数重复“排名问题”字段名称。
- you will obtain a list [rank question 1, ..., rank question2, ...]您将获得一个列表 [rank question 1, ..., rank question2, ...]
Finally assign these lists as json and pass to pandas Dataframe.最后将这些列表分配为 json 并传递给 pandas Dataframe。

Answer 2

I worked out a solution I'm mostly happy with:我制定了一个我最满意的解决方案：

rank_fun = {}

if len(ranking) < 1:
    print('No Ranking Questions! Moving on...')
    
else:
    for i in ranking:
        rank_fun[i] = pd.concat([Respondent_ID, df.iloc[1:,i]], axis=1)
        rank_fun[i]['question'] = rank_fun[i].columns[1]

rank1 = pd.DataFrame()

for i in ranking:
    rank1 = rank_fun[i].append(rank_fun[i])
    
rank1.rename(columns={rank1.columns[1]: "answer" }, inplace = True)
rank1['answer option'] = "Rank"
rank1 = rank1[rank1['answer'].str.contains("nan")==False]

rank1

My only annoyance now is when there are no ranking questions I wish it wouldn't throw an error.我现在唯一的烦恼是当没有排名问题时，我希望它不会引发错误。 Any ideas?有任何想法吗？

dataframe 中的 For 循环，并将每次迭代保留在 Python 中

问题描述

2 个解决方案

解决方案1
0 2022-02-03 04:56:31

解决方案2
0 2022-02-03 05:35:45

dataframe 中的 For 循环，并将每次迭代保留在 Python 中

问题描述

2 个解决方案

解决方案1 0 2022-02-03 04:56:31

解决方案2 0 2022-02-03 05:35:45

解决方案1
0 2022-02-03 04:56:31

解决方案2
0 2022-02-03 05:35:45