[英]For loop in a dataframe and keep each iteration in Python
I'm creating a template to process SurveyMonkey surveys into a Tableau ready format.我正在创建一个模板来将 SurveyMonkey 调查处理为 Tableau 就绪格式。 I'm breaking down the surveys into their question types.我将调查分解为他们的问题类型。 I want to automate the script as much as possible so I'm trying to use a for loop for each question type.我想尽可能地自动化脚本,所以我尝试为每种问题类型使用一个 for 循环。
For our purposes let's stick to the Ranking type question.出于我们的目的,让我们坚持排名类型的问题。
Let's say I have a dataframe like this:假设我有一个像这样的 dataframe:
d = {'Respondent ID': [123, 234, 345], 'rank question 1': [3, 5, 4], 'rank question 2': [1, 6, 7]}
df = pd.DataFrame(data=d)
df
I want the final dataframe to look like this:我希望最终的 dataframe 看起来像这样:
rankfinal = {'Respondent ID': [123, 234, 345, 123, 234, 345], 'answer': [3, 5, 4, 1, 6, 7], 'question': ['rank question 1', 'rank question 1', 'rank question 1', 'rank question 2', 'rank question 2', 'rank question 2']}
rank1 = pd.DataFrame(data=rankfinal)
rank1
I've tried several attempts, but here is my best:我已经尝试了几次,但这是我最好的:
ranking = [1,2] # These are the column positions in the original survey dataframe
hold = []
for i in range(len(ranking)):
hold.append(i)
respondent_id = []
questions = []
answers = []
for i in hold:
if len(hold) < 1:
print('No Ranking Questions! Moving on...')
else:
respondent_id.append(Respondent_ID)
questions.append(df.columns[ranking[i]])
answers.append(df.iloc[1:, ranking[i]])
While the code works, I don't think I can end up doing anything with the outputs to get them into a single dataframe.虽然代码有效,但我认为我最终无法对输出做任何事情以将它们放入单个 dataframe。 I've always struggled with loops so hopefully you might be able to help me get this project done.我一直在与循环作斗争,所以希望你能帮助我完成这个项目。
Thanks in advance.提前致谢。
I would approach this problem by consolidating rank questions.我会通过合并排名问题来解决这个问题。
Loop through all "rank question" columns, and consolidate their values.遍历所有“排名问题”列,并巩固它们的值。
Duplicate Respondent ID field n times where n == num of "rank question" columns.重复受访者 ID 字段 n 次,其中 n == “排名问题”列的数量。
Create a list where you repeat "rank question" field names by the number of rows.创建一个列表,在其中按行数重复“排名问题”字段名称。
Finally assign these lists as json and pass to pandas Dataframe.最后将这些列表分配为 json 并传递给 pandas Dataframe。
I worked out a solution I'm mostly happy with:我制定了一个我最满意的解决方案:
rank_fun = {}
if len(ranking) < 1:
print('No Ranking Questions! Moving on...')
else:
for i in ranking:
rank_fun[i] = pd.concat([Respondent_ID, df.iloc[1:,i]], axis=1)
rank_fun[i]['question'] = rank_fun[i].columns[1]
rank1 = pd.DataFrame()
for i in ranking:
rank1 = rank_fun[i].append(rank_fun[i])
rank1.rename(columns={rank1.columns[1]: "answer" }, inplace = True)
rank1['answer option'] = "Rank"
rank1 = rank1[rank1['answer'].str.contains("nan")==False]
rank1
My only annoyance now is when there are no ranking questions I wish it wouldn't throw an error.我现在唯一的烦恼是当没有排名问题时,我希望它不会引发错误。 Any ideas?有任何想法吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.