简体   繁体   English

使用 random.sample() 时,为什么我得到的值长度 (1) 与索引 (3) 的长度不匹配?

[英]Why do I get Length of values (1) does not match length of index (3) when using random.sample()?

My Python code returns the following error message:我的 Python 代码返回以下错误消息:

  File "/Users/christianmagelssen/Desktop/Koding/analyse/moduler/resultater.py", line 64, in allokereGrupper
    group1['GRUPPE'] = velger
ValueError: Length of values (1) does not match length of index (3)

I have tried many different things to solve this issue:我尝试了很多不同的方法来解决这个问题:

  1. I have tried to change the k to 1, 2 but that doesn't help.我试图将 k 更改为 1、2 但这无济于事。
  2. I have tried to different pandas code to drop duplicates, including .unique and the drop duplicates that I am using now.我尝试使用不同的 Pandas 代码来删除重复项,包括 .unique 和我现在正在使用的删除重复项。

I know that my code worked 3 months ago but on another dataset.我知道我的代码在 3 个月前可以工作,但是在另一个数据集上。 Can someone help me so I understand what I am doing wrong here?有人可以帮助我,以便我了解我在这里做错了什么吗?

Here is all my code这是我所有的代码

results.py结果.py

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import random

class Resultat:

    def lastInnOgRydd(path, LagreCsv = False):
        df = pd.read_csv(path, skiprows=2, decimal=".")
        filt = df['FINISH'] == 'DNF'
        dnf = df[filt]
        dnf = dnf.replace('DNF', 1)
        if LagreCsv == True:
            dnf.to_csv('DNF.csv')
        df.replace('DNF', np.NaN, inplace=True)
        df.replace('GARBAGE GARBAGE', np.NaN, inplace=True) #Denne finnes det nok en bedre løsning på
        df.dropna(subset=['FINISH'], inplace=True)
        df.dropna(subset=['NAME'], inplace=True)
        return df

    def endreDataType(df):
        df["FINISH"] = df["FINISH"].str.replace(',', '.').astype(float)
        df["INTER 1"] = df["INTER 1"].str.replace(',', '.').astype(float)
        df["SECTION IM4-FINISH"] = df["SECTION IM4-FINISH"].str.replace(',', '.').astype(float)
        df["COMMENT"] = df['COMMENT'].astype(int)
        df["COMMENT"] = df['COMMENT'].astype(str)
        df["COMMENT"] = df['COMMENT'].str.replace('11', 'COURSE 1')
        df["COMMENT"] = df['COMMENT'].str.replace('22', 'COURSE 2')
        df["COMMENT"] = df['COMMENT'].str.replace('33', 'COURSE 3')
        df["COMMENT"] = df['COMMENT'].str.replace('55', 'UTKJORING')
        df["COMMENT"] = df['COMMENT'].str.replace('99', 'STRAIGHT-GLIDING')
        pd.to_numeric(df['FINISH'], downcast='float', errors='raise')
        pd.to_numeric(df['INTER 1'], downcast='float', errors='raise')
        pd.to_numeric(df['SECTION IM4-FINISH'], downcast='float', errors='raise')
        return df

    def navnendringCommentTilCourse(df):
        df.rename(columns={'COMMENT': 'COURSE'}, inplace=True)
        return df

    def finnBesteRunder(df):
        grupper = df.groupby(['BIB#', 'COURSE'])
        bestruns = grupper['FINISH'].apply(lambda x: x.nsmallest(2).mean()).reset_index()
        df1 = bestruns.pivot('BIB#', 'COURSE', 'FINISH').reset_index()
        df1['GJENNOMSNITT'] = df1['COURSE 1'].add(df1['COURSE 2']).add(df1['COURSE 3']).div(3)
        #df1['PRESTASJON'] = df1['MEAN'].div(df1['STRAIGHT-GLIDING']) # fjerner denne nå, men må med i den ordentilige analysen
        return df1

    def allokereGrupper(df1):
        df1 = df1.sort_values(by='GJENNOMSNITT', ascending=True)
        mask = np.arange(len(df1)) % 2
        group1 = df1.loc[mask == 0]
        group1 =  group1.drop_duplicates(subset=['BIB#'])
        print(group1)
        group2 = df1.loc[mask == 1]
        group2 =  group2.drop_duplicates(subset=['BIB#'])
        print(group2)
        
        grupper = ['RANDOM', 'BLOCKED']

        for i in group1['BIB#']:
            velger = random.sample(grupper, k=1)
        group1['GRUPPE'] = velger

 

main.py主文件

from moduler import Resultat


path = "http://www.cmagelssen.no/pilot2.csv"

df = Resultat.lastInnOgRydd(path)
df = Resultat.endreDataType(df)
df = Resultat.navnendringCommentTilCourse(df)
df = Resultat.finnBesteRunder(df)
df = Resultat.allokereGrupper(df)



The problem is that velger is a list.问题是velger是一个列表。 It looks like either ['RANDOM'] or ['BLOCKED'] .它看起来像['RANDOM']['BLOCKED'] When you try to create the 'GRUPPE' column, you must feed a non-iterable, like a string or int.当您尝试创建'GRUPPE'列时,您必须提供不可迭代的内容,例如字符串或整数。

If you feed it an iterable, Pandas assumes that your iterable is the same length as your dataframe, and fills every dataframe row with the corresponding value in the iterable.如果您为其提供一个可迭代对象,Pandas 会假定您的可迭代对象与您的数据帧长度相同,并用可迭代对象中的相应值填充每个数据帧行。 (3rd row gets 3rd list element, for example). (例如,第 3 行获取第 3 个列表元素)。 But of course your iterable has length one, and the dataframe group1 does not necessarily just have one element.但是当然您的迭代长度为 1,并且数据框group1不一定只有一个元素。 Maybe in your previous dataset that was the case.也许在您之前的数据集中就是这种情况。

It's not entirely clear to me what is your goal from the code, but if your intention is to fill every cell in the 'GRUPPE' column with the same value (either 'RANDOM' or 'BLOCKED' , then change:我并不完全清楚代码中的目标是什么,但是如果您打算用相同的值( 'RANDOM''BLOCKED'填充'GRUPPE'列中的每个单元格,则更改:

group1['GRUPPE'] = velger

to

group1['GRUPPE'] = velger[0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM