简体   繁体   English

如何制作一个 for 循环来填充 DataFrame?

[英]how can I make a for loop to populate a DataFrame?

and from the begining I thanks everyone that seeks to help.从一开始我就感谢所有寻求帮助的人。

I have started to learn python and came across a opportunity to use python to my advantage at work我已经开始学习 python 并遇到了一个在工作中使用 python 的机会

Im basically made a script that reads a google sheets file, import it into pandas and cleaned up the data.我基本上制作了一个读取谷歌表格文件的脚本,将其导入 pandas 并清理数据。

In the end, I just wanna have the name of the agents in the columns and all of their values for resolucao column below them so I can take the average amount of time for all of the agentes, but I'm struggling to make it with a list comprehension / for loop.最后,我只想在列中显示代理的名称,并在它们下方显示他们在 resolucao 列中的所有值,这样我就可以为所有代理计算平均时间,但我正在努力做到这一点列表推导式/for 循环。

This is what the DataFrame looks like after I cleaned it up这是我清理后的 DataFrame 的样子数据清理

And this is the Code that I tried to Run这是我尝试运行的代码

PS: Sorry for the messy code. PS:抱歉乱码。

agentes_unique = list(df['Agente'].unique())
agentes_duplicated = df['Agente']
value_resolucao_duplicated = df['resolucao']
n_of_rows = []
for row in range(len(df)):
    n_of_rows.append(row)

i = 0
while n_of_rows[i] < len(n_of_rows):
    df2 = pd.DataFrame({agentes_unique[i]: (value for value in df['resolucao'][i] if df['Agente'][i] == agentes_unique[i])})
    i+= 1
df2.to_excel('teste.xlsx',index = True, header = True)

But in the end it came to this error:但是最后却出现了这个错误:

Traceback (most recent call last):
  File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\indexes\range.py", line 385, in get_loc
    return self._range.index(new_key)
ValueError: 0 is not in range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\FELIPE\Desktop\Python\webscraping\bot_csv_extract\bot_login\main.py", line 50, in <module>
    df2 = pd.DataFrame({'Agente': (valor for valor in df['resolucao'][i] if df['Agente'][i] == 'Gabriel')})
  File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\series.py", line 958, in __getitem__   
    return self._get_value(key)
  File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\series.py", line 1069, in _get_value
    loc = self.index.get_loc(label)
  File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\indexes\range.py", line 387, in get_loc
    raise KeyError(key) from err
KeyError: 0

I feel like I'm making some obvious mistake but I cant fix it我觉得我犯了一些明显的错误,但我无法修复它

Again, thanks to anyone who tries to help再次感谢任何试图提供帮助的人

Are you looking to do something like this?你想做这样的事情吗? This is just sample data, but a good start for what you are looking to do if I understand what your wanting to do.这只是示例数据,但如果我了解您想要做什么,那么这是您想要做的事情的良好开端。

data = {
    'Column1' : ['Data', 'Another_Data', 'More_Data', 'Last_Data'],
    'Agente' : ['Name', 'Another_Name', 'More_Names', 'Last_Name'],
    'Data' : [1, 2, 3, 4]
}
df = pd.DataFrame(data)
df = df.pivot(index = 'Column1', columns=['Agente'], values = 'Data')
df.reset_index()

It is not recommended to use for loops against pandas DataFrames : It is considered messy and inefficient.不建议对 pandas DataFrames 使用 for 循环:它被认为是混乱且低效的。 With some practice you will be able to approach problems in such a way that you will not need to use for loops in these cases.通过一些练习,您将能够以在这些情况下不需要使用 for 循环的方式处理问题。

From what I understand, your goal can be realized in 3 simple steps:据我了解,您的目标可以通过 3 个简单的步骤实现:

1. Select the 2 columns of interest. 1. Select 感兴趣的2列。 I recommend you take a look at how to access different elements of a dataFrame:我建议您看一下如何访问 dataFrame 的不同元素:

df = df[["Agent", "resolucao"]]

2. Convert the column you want to average to a numeric value. 2.将要平均的列转换为数值。 Say seconds:秒说:

df["resolucao"] = pd.to_timedelta(df['resolucao'].astype(str)).dt.total_seconds()

3. Apply an average aggregation, via the groupby() function: 3.通过groupby() function 应用平均聚合:

df = df.groupby(["Agente"]).mean().reset_index()

Hope this helps.希望这可以帮助。

For the next time, I also recommend you to not post the database as an image in order to be able to reproduce your code.下次,我还建议您不要将数据库作为图像发布,以便能够重现您的代码。

Cheers and keep it up!干杯并保持下去!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM