简体   繁体   English

如何在Python中仅提取url的特定部分并将其值添加为df中每一行的另一列?

[英]How to extract only a specific part of url in Python and add its value as another column in df for every row?

I have a df containing user and url looking like this. 我有一个包含用户和网址的df,看起来像这样。

df

User      Url
1         http://www.mycompany.com/Overview/Get
2         http://www.mycompany.com/News
3         http://www.mycompany.com/Accountinfo
4         http://www.mycompany.com/Personalinformation/Index
...

I want to add another column page that only takes the second part of the url, so I'd be having it like this. 我想添加另一个仅包含网址第二部分的列页面,所以我会像这样。

user      url                                                  page
1         http://www.mycompany.com/Overview/Get                Overview
2         http://www.mycompany.com/News                        News
3         http://www.mycompany.com/Accountinfo                 Accountinfo
4         http://www.mycompany.com/Personalinformation/Index   Personalinformation
...

My code below is not working. 我下面的代码无法正常工作。

slashparts = df['url'].split('/')
df['page'] = slashparts[4]

The error I'm getting 我得到的错误

  AttributeError                            Traceback (most recent call last)
  <ipython-input-23-0350a98a788c> in <module>()
  ----> 1 slashparts = df['request_url'].split('/')
        2 df['page'] = slashparts[1]

  ~\Anaconda\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   4370             if 
   self._info_axis._can_hold_identifiers_and_holds_name(name):
   4371                 return self[name]
  -> 4372             return object.__getattribute__(self, name)
   4373 
   4374     def __setattr__(self, name, value):

 AttributeError: 'Series' object has no attribute 'split'

Use pandas text functions with str and for select 4. lists use str[3] , because python counts from 0 : 将pandas 文本函数str和select 4.使用str[3]使用str[3] ,因为python从0计数:

df['page'] = df['Url'].str.split('/').str[3]

Or if performance is important use list comprehension : 或者,如果性能很重要,请使用list comprehension

df['page'] = [x.split('/')[3] for x in df['Url']]

print (df)
   User                                                Url  \
0     1              http://www.mycompany.com/Overview/Get   
1     2                      http://www.mycompany.com/News   
2     3               http://www.mycompany.com/Accountinfo   
3     4  http://www.mycompany.com/Personalinformation/I...   

                  page  
0             Overview  
1                 News  
2          Accountinfo  
3  Personalinformation  

I'm attempting to be a little more explicit to handle where http might be missing and other variations 我试图更加明确地处理可能会丢失http和其他变体的地方

pat = '(?:https?://)?(?:www\.)?(?:\w+\.\w+\/)([^/]*)'
df.assign(page=df.Url.str.extract(pat, expand=False))

   User                                                Url                 page
0     1              http://www.mycompany.com/Overview/Get             Overview
1     2                      http://www.mycompany.com/News                 News
2     3                      www.mycompany.com/Accountinfo          Accountinfo
3     1              http://www.mycompany.com/Overview/Get             Overview
4     2                                 mycompany.com/News                 News
5     3              https://www.mycompany.com/Accountinfo          Accountinfo
6     4  http://www.mycompany.com/Personalinformation/I...  Personalinformation

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在数据框的每一行的列中添加特定值 - How to add specific value in column in in every row in dataframe Python - 如何编写循环以根据另一个列表的元素为列表中的每个 df 添加一列 python - Python - How to write a loop to add a column for every df in a list based on another list's element python 如何防止熊猫仅将一个 df 的值分配给另一列的另一行? - How to prevent pandas from only assigning value from one df to column of another for only one row? 在Python中将一个df中的每一列划分为另一个df中的每一列 - Divide every column in one df to every column in another df in Python 如何编写 Python 代码来查找特定行值的 Pandas DF 中列的值的总和? - How can I write the Python code to find the sum of values of a column in a Pandas DF for a specific row value? 将 df 中的列值与具有特定格式的另一个 df 的列值匹配? - Match the column value in a df with the column value of another df with the specific format? 如果 df['column'] 中的 substring:将值添加到另一列 - If substring in df['column']: add value to another column 根据另一列中的特定值仅替换 df 列中的特定值 - Replace only specific values in df column based on specific value in another column 如何根据具有行和列信息的字典将一个 df 的特定数据替换为另一个 df - How to replace specific data of one df with another df based on a dictionary having row and column info 如果列值与另一个 DF 列表中的值匹配,则向 DF 添加值 - Add value to DF if column value matches value in list of another DF
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM