[英]Pandas for-loop with a list of columns
I'm trying to open links in my dataframe using selenium webdriver, the dataframe 'df1' looks like this:我正在尝试使用 selenium webdriver 打开我的 dataframe 中的链接,dataframe 'df1' 看起来像这样:
user用户 | repo1回购1 | repo2回购协议2 | repo3回购3 | |
---|---|---|---|---|
0 0 | breed品种 | cs149-f22 cs149-f22 | kattis2canvas kattis2canvas | grpc-maven-skeleton grpc-maven-骨架 |
1 1个 | GrahamDumpleton格雷厄姆邓普尔顿 | mod_wsgi mod_wsgi | wrapt包装 | NaN钠盐 |
The links I want to open include the content in column 'user' and one of 3 'repo' columns.我要打开的链接包括“用户”列和 3 个“回购”列之一中的内容。 I encounter a bug when I iterate the 'repo' columns.我在迭代“repo”列时遇到错误。
Could anyone help me out?谁能帮帮我? Thank you!谢谢!
Here is my best try:这是我最好的尝试:
repo_cols = [col for col in df1.columns if 'repo' in col]
for index, row in df1.iterrows():
user = row['user']
for repo_name in repo_cols:
try:
repo = row['repo_name']
current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
driver.get(current_url)
time.sleep(0.5)
except:
pass
Here is the bug I encounter:这是我遇到的错误:
KeyError: 'repo_name'
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'repo_name'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-50-eb068230c3fd> in <module>
4 user = row['user']
5 for repo_name in repo_cols:
----> 6 repo = row['repo_name']
7 current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
8 driver.get(current_url)
~\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
851
852 elif key_is_scalar:
--> 853 return self._get_value(key)
854
855 if is_hashable(key):
~\anaconda3\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
959
960 # Similar to Index.get_value, but we do not fall back to positional
--> 961 loc = self.index.get_loc(label)
962 return self.index._get_values_for_loc(self, loc, label)
963
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 'repo_name'
You're getting the KeyError
because there is no column named repro_name
.您收到KeyError
是因为没有名为repro_name
的列。
You need to replace row['repo_name']
with row[repo_name]
.您需要将row['repo_name']
替换为row[repo_name]
。
Try this:尝试这个:
import pandas as pd
from selenium import webdriver
df1= pd.DataFrame({'user': ['breed', 'GrahamDumpleton'],
'repo1': ['cs149-f22', 'mod_wsgi'],
'repo2': ['kattis2canvas', 'wrapt']})
repo_cols = [col for col in df1.columns if 'repo' in col]
for index, row in df1.iterrows():
user = row['user']
for repo_name in repo_cols:
try:
repo = row[repo_name]
browser=webdriver.Chrome()
current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
browser.get(current_url)
time.sleep(0.5)
except:
pass
I think you should remove the quotation mark on the:我认为你应该删除引号:
repo = row['repo_name']回购=行['repo_name']
It should be:它应该是:
repo = row[repo_name]回购=行[repo_name]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.