Pandas 带有列列表的 for 循环

Question

I'm trying to open links in my dataframe using selenium webdriver, the dataframe 'df1' looks like this:我正在尝试使用 selenium webdriver 打开我的 dataframe 中的链接，dataframe 'df1' 看起来像这样：

	user用户	repo1回购1	repo2回购协议2	repo3回购3
0 0	breed品种	cs149-f22 cs149-f22	kattis2canvas kattis2canvas	grpc-maven-skeleton grpc-maven-骨架
1 1个	GrahamDumpleton格雷厄姆邓普尔顿	mod_wsgi mod_wsgi	wrapt包装	NaN钠盐

The links I want to open include the content in column 'user' and one of 3 'repo' columns.我要打开的链接包括“用户”列和 3 个“回购”列之一中的内容。 I encounter a bug when I iterate the 'repo' columns.我在迭代“repo”列时遇到错误。

Could anyone help me out?谁能帮帮我？ Thank you!谢谢！

Here is my best try:这是我最好的尝试：

repo_cols = [col for col in df1.columns if 'repo' in col]

for index, row in df1.iterrows():
    user = row['user']
    for repo_name in repo_cols:
        try:
            repo = row['repo_name']
            current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
            driver.get(current_url)
            time.sleep(0.5)
        except:
            pass

Here is the bug I encounter:这是我遇到的错误：

KeyError: 'repo_name' 

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'repo_name'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-50-eb068230c3fd> in <module>
      4     user = row['user']
      5     for repo_name in repo_cols:
----> 6         repo = row['repo_name']
      7         current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
      8         driver.get(current_url)

~\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    851 
    852         elif key_is_scalar:
--> 853             return self._get_value(key)
    854 
    855         if is_hashable(key):

~\anaconda3\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
    959 
    960         # Similar to Index.get_value, but we do not fall back to positional
--> 961         loc = self.index.get_loc(label)
    962         return self.index._get_values_for_loc(self, loc, label)
    963 

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 'repo_name'

Answer 1

You're getting the KeyError because there is no column named repro_name .您收到KeyError是因为没有名为repro_name的列。
You need to replace row['repo_name'] with row[repo_name] .您需要将row['repo_name']替换为row[repo_name] 。

Try this:尝试这个：

import pandas as pd
from selenium import webdriver

df1= pd.DataFrame({'user': ['breed', 'GrahamDumpleton'],
 'repo1': ['cs149-f22', 'mod_wsgi'],
 'repo2': ['kattis2canvas', 'wrapt']})

repo_cols = [col for col in df1.columns if 'repo' in col]

for index, row in df1.iterrows():
    user = row['user']
    for repo_name in repo_cols:
        try:
            repo = row[repo_name]
            browser=webdriver.Chrome()
            current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
            browser.get(current_url)
            time.sleep(0.5)
        except:
            pass

Answer 2

I think you should remove the quotation mark on the:我认为你应该删除引号：

repo = row['repo_name']回购=行['repo_name']

It should be:它应该是：

repo = row[repo_name]回购=行[repo_name]

Pandas 带有列列表的 for 循环

问题描述

2 个解决方案

解决方案1
0 2022-11-27 17:45:16

解决方案2
0 已采纳 2022-11-27 17:45:53

Pandas 带有列列表的 for 循环

问题描述

2 个解决方案

解决方案1 0 2022-11-27 17:45:16

解决方案2 0 已采纳 2022-11-27 17:45:53

解决方案1
0 2022-11-27 17:45:16

解决方案2
0 已采纳 2022-11-27 17:45:53