[英]Is there a way to replace all colum values with a list in pandas?
Here is my code:这是我的代码:
l_names = [ ]
for l in links:
l_names.append(l.get_text())
df = [ ]
for u in urls:
req = s.get(u)
req_soup = BeautifulSoup(req.content,'lxml')
req_tables = req_soup.find_all('table', {'class':'infobox vevent'})
req_df = pd.read_html(str(req_tables), flavor='bs4', header=0)
dfr = pd.concat(req_df)
dfr = dfr.drop(index=0)
dfr.columns = range(dfr.columns.size)
dfr[1] = dfr[1].str.replace(r"([A-Z])", r" \1").str.strip().str.replace(' ', ' ')
dfr = dfr[~dfr[0].isin(remove_list)]
dfr = dfr.dropna()
dfr = dfr.reset_index(drop=True)
dfr.insert(loc=0, column='Title', value='Change')
df.append(dfr)
Here is some info about l_names
and df
:以下是有关l_names
和df
的一些信息:
len(l_names)
83
len(df)
83
display(df)
[ Title 0 1
0 Change Genre Melodrama Revenge
1 Change Written by Kwon Soon-won Park Sang-wook
2 Change Directed by Yoon Sung-sik
3 Change Starring Park Si-hoo Jang Hee-jin
4 Change No. of episodes 16
5 Change Running time 60 minutes
6 Change Original network T V Chosun
7 Change Original release January 27 – March 24, 2019,
Title 0 1
0 Change Genre Romance Comedy
1 Change Written by Jung Do-yoon Oh Seon-hyung
2 Change Directed by Lee Jin-seo Lee So-yeon
3 Change Starring Jang Na-ra Choi Daniel Ryu Jin Kim Min-seo
4 Change No. of episodes 20
5 Change Running time Mondays and Tuesdays at 21:55 ( K S T)
6 Change Original network Korean Broadcasting System
7 Change Original release 2 May –5 July 2011,
Title 0 1
0 Change Genre Mystery Thriller Suspense
1 Change Directed by Kim Yong-soo
2 Change Starring Cho Yeo-jeong Kim Min-jun Shin Yoon-joo ...
3 Change No. of episodes 4
4 Change Running time 61-65 minutes
5 Change Original network K B S2
6 Change Original release March 14 – March 22, 2016,
Title 0 1
0 Change Genre Melodrama Comedy Romance
1 Change Written by Yoon Sung-hee
2 Change Directed by Lee Joon-hyung
3 Change Starring Ji Chang-wook Wang Ji-hye Kim Young-kwang P...
4 Change No. of episodes 24
5 Change Running time Wednesdays and Thursdays at 21:20 ( K S T)
6 Change Original network Channel A
7 Change Original release December 21, 2011 – March 8, 2012,
I want to replace 'Change' with TV show names which are stored in l_names
.我想用存储在l_names
中的电视节目名称替换“更改”。 For this example, only four TV shows will be given but I have 83 in total.在这个例子中,只有四个电视节目,但我总共有 83 个。
print(l_names)
['Babel', 'Baby Faced Beauty', 'Babysitter', "Bachelor's Vegetable Store"]
But when I try to plug in l_names
in my for loop code as my values.但是当我尝试在我的 for 循环代码中插入l_names
作为我的值时。 I get an error.我得到一个错误。
dfr.insert(loc=0, column='Title', value=l_names)
df.append(dfr)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [96], in <cell line: 19>()
29 dfr = dfr.dropna()
30 dfr = dfr.reset_index(drop=True)
---> 31 dfr.insert(loc=0, column='Title', value=l_names)
32 df.append(dfr)
File ~/anaconda3/envs/beans/lib/python3.9/site-packages/pandas/core/frame.py:4444, in DataFrame.insert(self, loc, column, value, allow_duplicates)
4441 if not isinstance(loc, int):
4442 raise TypeError("loc must be int")
-> 4444 value = self._sanitize_column(value)
4445 self._mgr.insert(loc, column, value)
File ~/anaconda3/envs/beans/lib/python3.9/site-packages/pandas/core/frame.py:4535, in DataFrame._sanitize_column(self, value)
4532 return _reindex_for_setitem(value, self.index)
4534 if is_list_like(value):
-> 4535 com.require_length_match(value, self.index)
4536 return sanitize_array(value, self.index, copy=True, allow_2d=True)
File ~/anaconda3/envs/beans/lib/python3.9/site-packages/pandas/core/common.py:557, in require_length_match(data, index)
553 """
554 Check the length of data matches the length of the index.
555 """
556 if len(data) != len(index):
--> 557 raise ValueError(
558 "Length of values "
559 f"({len(data)}) "
560 "does not match length of index "
561 f"({len(index)})"
562 )
ValueError: Length of values (83) does not match length of index (8)
I also tried adding a for loop in my for loop.我还尝试在我的 for 循环中添加一个 for 循环。
for x in l_names:
dfr.insert(loc=0, column='Title', value=x)
df.append(dfr)
I get this error我收到这个错误
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [97], in <cell line: 19>()
30 dfr = dfr.reset_index(drop=True)
31 for x in l_names:
---> 32 dfr.insert(loc=0, column='Title', value=x)
33 df.append(dfr)
File ~/anaconda3/envs/beans/lib/python3.9/site-packages/pandas/core/frame.py:4440, in DataFrame.insert(self, loc, column, value, allow_duplicates)
4434 raise ValueError(
4435 "Cannot specify 'allow_duplicates=True' when "
4436 "'self.flags.allows_duplicate_labels' is False."
4437 )
4438 if not allow_duplicates and column in self.columns:
4439 # Should this be a different kind of error??
-> 4440 raise ValueError(f"cannot insert {column}, already exists")
4441 if not isinstance(loc, int):
4442 raise TypeError("loc must be int")
ValueError: cannot insert Title, already exists
I also added allow_duplicates = True
and all that did was just make the Titles and names repeat over and over again.我还添加了allow_duplicates = True
,所做的只是让标题和名称一遍又一遍地重复。
I also have tried other methods to add in the title name.我也尝试过其他方法来添加标题名称。 But my lack of skill in using pandas
has led me to this dead end.但是我缺乏使用pandas
的技能导致我走上了这条死胡同。
Thanks again for your help and expertise.再次感谢您的帮助和专业知识。
Solution 1: After you create the df
with 83 dataframe in it, you can loop df
and update Title
column values.解决方案 1:创建包含 83 数据框的df
后,您可以循环df
并更新Title
列值。
for i,dfr in enumerate(df):
dfr['Title'] = l_names[i]
Solution 2: In your loop, you don't need an extra loop, just use the index i
to get the title and insert it.解决方案2:在您的循环中,您不需要额外的循环,只需使用索引i
获取标题并插入即可。
for i,u in enumerate(urls):
...
dfr.insert(loc=0,column="Title",value=l_names[i])
df.append(dfr)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.