简体   繁体   English

Pandas 带有列列表的 for 循环

[英]Pandas for-loop with a list of columns

I'm trying to open links in my dataframe using selenium webdriver, the dataframe 'df1' looks like this:我正在尝试使用 selenium webdriver 打开我的 dataframe 中的链接,dataframe 'df1' 看起来像这样:

user用户 repo1回购1 repo2回购协议2 repo3回购3
0 0 breed品种 cs149-f22 cs149-f22 kattis2canvas kattis2canvas grpc-maven-skeleton grpc-maven-骨架
1 1个 GrahamDumpleton格雷厄姆邓普尔顿 mod_wsgi mod_wsgi wrapt包装 NaN钠盐

The links I want to open include the content in column 'user' and one of 3 'repo' columns.我要打开的链接包括“用户”列和 3 个“回购”列之一中的内容。 I encounter a bug when I iterate the 'repo' columns.我在迭代“repo”列时遇到错误。

Could anyone help me out?谁能帮帮我? Thank you!谢谢!

Here is my best try:这是我最好的尝试:

repo_cols = [col for col in df1.columns if 'repo' in col]

for index, row in df1.iterrows():
    user = row['user']
    for repo_name in repo_cols:
        try:
            repo = row['repo_name']
            current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
            driver.get(current_url)
            time.sleep(0.5)
        except:
            pass

Here is the bug I encounter:这是我遇到的错误:

KeyError: 'repo_name' 

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3079             try:
-> 3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'repo_name'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-50-eb068230c3fd> in <module>
      4     user = row['user']
      5     for repo_name in repo_cols:
----> 6         repo = row['repo_name']
      7         current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
      8         driver.get(current_url)

~\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    851 
    852         elif key_is_scalar:
--> 853             return self._get_value(key)
    854 
    855         if is_hashable(key):

~\anaconda3\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
    959 
    960         # Similar to Index.get_value, but we do not fall back to positional
--> 961         loc = self.index.get_loc(label)
    962         return self.index._get_values_for_loc(self, loc, label)
    963 

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3080                 return self._engine.get_loc(casted_key)
   3081             except KeyError as err:
-> 3082                 raise KeyError(key) from err
   3083 
   3084         if tolerance is not None:

KeyError: 'repo_name'


You're getting the KeyError because there is no column named repro_name .您收到KeyError是因为没有名为repro_name的列。
You need to replace row['repo_name'] with row[repo_name] .您需要将row['repo_name']替换为row[repo_name]

Try this:尝试这个:

import pandas as pd
from selenium import webdriver

df1= pd.DataFrame({'user': ['breed', 'GrahamDumpleton'],
 'repo1': ['cs149-f22', 'mod_wsgi'],
 'repo2': ['kattis2canvas', 'wrapt']})

repo_cols = [col for col in df1.columns if 'repo' in col]

for index, row in df1.iterrows():
    user = row['user']
    for repo_name in repo_cols:
        try:
            repo = row[repo_name]
            browser=webdriver.Chrome()
            current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
            browser.get(current_url)
            time.sleep(0.5)
        except:
            pass

I think you should remove the quotation mark on the:我认为你应该删除引号:

repo = row['repo_name']回购=行['repo_name']

It should be:它应该是:

repo = row[repo_name]回购=行[repo_name]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM