简体   繁体   English

有没有办法用熊猫中的列表替换所有列值?

[英]Is there a way to replace all colum values with a list in pandas?

Here is my code:这是我的代码:

l_names = [ ]
for l in links:
   l_names.append(l.get_text())


df = [ ]
for u in urls:
    req = s.get(u)
    req_soup = BeautifulSoup(req.content,'lxml')
    req_tables = req_soup.find_all('table', {'class':'infobox vevent'})
    req_df = pd.read_html(str(req_tables), flavor='bs4', header=0)
    dfr = pd.concat(req_df)
    dfr = dfr.drop(index=0)
    dfr.columns = range(dfr.columns.size)
    dfr[1] = dfr[1].str.replace(r"([A-Z])", r" \1").str.strip().str.replace(' ', ' ')
    dfr = dfr[~dfr[0].isin(remove_list)]
    dfr = dfr.dropna()
    dfr = dfr.reset_index(drop=True)
    dfr.insert(loc=0, column='Title', value='Change')
    df.append(dfr)    

Here is some info about l_names and df :以下是有关l_namesdf的一些信息:

len(l_names)
83

len(df)
83

display(df)


 [    Title                 0                               1
 0  Change             Genre               Melodrama Revenge
 1  Change        Written by  Kwon  Soon-won Park  Sang-wook
 2  Change       Directed by                  Yoon  Sung-sik
 3  Change          Starring      Park  Si-hoo Jang  Hee-jin
 4  Change   No. of episodes                              16
 5  Change      Running time                      60 minutes
 6  Change  Original network                     T V  Chosun
 7  Change  Original release     January 27 – March 24, 2019,
     Title                 0                                               1
 0  Change             Genre                                  Romance Comedy
 1  Change        Written by                   Jung  Do-yoon  Oh  Seon-hyung
 2  Change       Directed by                      Lee  Jin-seo  Lee  So-yeon
 3  Change          Starring  Jang  Na-ra Choi  Daniel Ryu  Jin Kim  Min-seo
 4  Change   No. of episodes                                              20
 5  Change      Running time         Mondays and  Tuesdays at 21:55 ( K S T)
 6  Change  Original network                    Korean  Broadcasting  System
 7  Change  Original release                            2  May –5  July 2011,
     Title                 0                                                  1
 0  Change             Genre                          Mystery Thriller Suspense
 1  Change       Directed by                                      Kim  Yong-soo
 2  Change          Starring  Cho  Yeo-jeong  Kim  Min-jun  Shin  Yoon-joo  ...
 3  Change   No. of episodes                                                  4
 4  Change      Running time                                      61-65 minutes
 5  Change  Original network                                             K B S2
 6  Change  Original release                          March 14 – March 22, 2016,
     Title                 0                                                  1
 0  Change             Genre                         Melodrama  Comedy  Romance
 1  Change        Written by                                     Yoon  Sung-hee
 2  Change       Directed by                                    Lee  Joon-hyung
 3  Change          Starring  Ji  Chang-wook Wang  Ji-hye Kim  Young-kwang P...
 4  Change   No. of episodes                                                 24
 5  Change      Running time        Wednesdays and  Thursdays at 21:20 ( K S T)
 6  Change  Original network                                         Channel  A
 7  Change  Original release                  December 21, 2011 – March 8, 2012,

I want to replace 'Change' with TV show names which are stored in l_names .我想用存储在l_names中的电视节目名称替换“更改”。 For this example, only four TV shows will be given but I have 83 in total.在这个例子中,只有四个电视节目,但我总共有 83 个。

print(l_names)
['Babel', 'Baby Faced Beauty', 'Babysitter', "Bachelor's Vegetable Store"]

But when I try to plug in l_names in my for loop code as my values.但是当我尝试在我的 for 循环代码中插入l_names作为我的值时。 I get an error.我得到一个错误。

    dfr.insert(loc=0, column='Title', value=l_names)
    df.append(dfr)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [96], in <cell line: 19>()
     29 dfr = dfr.dropna()
     30 dfr = dfr.reset_index(drop=True)
---> 31 dfr.insert(loc=0, column='Title', value=l_names)
     32 df.append(dfr)

File ~/anaconda3/envs/beans/lib/python3.9/site-packages/pandas/core/frame.py:4444, in DataFrame.insert(self, loc, column, value, allow_duplicates)
   4441 if not isinstance(loc, int):
   4442     raise TypeError("loc must be int")
-> 4444 value = self._sanitize_column(value)
   4445 self._mgr.insert(loc, column, value)

File ~/anaconda3/envs/beans/lib/python3.9/site-packages/pandas/core/frame.py:4535, in DataFrame._sanitize_column(self, value)
   4532     return _reindex_for_setitem(value, self.index)
   4534 if is_list_like(value):
-> 4535     com.require_length_match(value, self.index)
   4536 return sanitize_array(value, self.index, copy=True, allow_2d=True)

File ~/anaconda3/envs/beans/lib/python3.9/site-packages/pandas/core/common.py:557, in require_length_match(data, index)
    553 """
    554 Check the length of data matches the length of the index.
    555 """
    556 if len(data) != len(index):
--> 557     raise ValueError(
    558         "Length of values "
    559         f"({len(data)}) "
    560         "does not match length of index "
    561         f"({len(index)})"
    562     )

ValueError: Length of values (83) does not match length of index (8)

I also tried adding a for loop in my for loop.我还尝试在我的 for 循环中添加一个 for 循环。

    for x in l_names:
        dfr.insert(loc=0, column='Title', value=x)
        df.append(dfr)

I get this error我收到这个错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [97], in <cell line: 19>()
     30 dfr = dfr.reset_index(drop=True)
     31 for x in l_names:
---> 32     dfr.insert(loc=0, column='Title', value=x)
     33     df.append(dfr)

File ~/anaconda3/envs/beans/lib/python3.9/site-packages/pandas/core/frame.py:4440, in DataFrame.insert(self, loc, column, value, allow_duplicates)
   4434     raise ValueError(
   4435         "Cannot specify 'allow_duplicates=True' when "
   4436         "'self.flags.allows_duplicate_labels' is False."
   4437     )
   4438 if not allow_duplicates and column in self.columns:
   4439     # Should this be a different kind of error??
-> 4440     raise ValueError(f"cannot insert {column}, already exists")
   4441 if not isinstance(loc, int):
   4442     raise TypeError("loc must be int")

ValueError: cannot insert Title, already exists

I also added allow_duplicates = True and all that did was just make the Titles and names repeat over and over again.我还添加了allow_duplicates = True ,所做的只是让标题和名称一遍又一遍地重复。

I also have tried other methods to add in the title name.我也尝试过其他方法来添加标题名称。 But my lack of skill in using pandas has led me to this dead end.但是我缺乏使用pandas的技能导致我走上了这条死胡同。

Thanks again for your help and expertise.再次感谢您的帮助和专业知识。

Solution 1: After you create the df with 83 dataframe in it, you can loop df and update Title column values.解决方案 1:创建包含 83 数据框的df后,您可以循环df并更新Title列值。

for i,dfr in enumerate(df):
    dfr['Title'] = l_names[i]

Solution 2: In your loop, you don't need an extra loop, just use the index i to get the title and insert it.解决方案2:在您的循环中,您不需要额外的循环,只需使用索引i获取标题并插入即可。

for i,u in enumerate(urls):
    ...
    dfr.insert(loc=0,column="Title",value=l_names[i])
    df.append(dfr)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM