简体   繁体   English

从 dataframe 解析 url

[英]Parsing urls from a dataframe

I am trying to parse urls from a dataframe to get the 'path'.我正在尝试从 dataframe 解析 url 以获取“路径”。 My dataframe has 3 columns: ['url'], ['impressions'], ['clicks'].我的 dataframe 有 3 列:['url']、['impressions']、['clicks']。 I want to replace all the urls by their Path.我想用他们的路径替换所有的网址。 Here is my code:这是我的代码:

import csv
from urllib.parse import urlparse

    fic_in = 'file.csv'

    df = pd.read_csv(fic_in)
    obj = urlparse(df['url'])
    df['url'] = obj.path
    print(df)

The csv file contains thousands of urls and 2 other columns of informations about the urls. csv 文件包含数千个 url 和其他 2 列有关 url 的信息。 For a technical reason, I can't parse the urls manipulating the csv, but I have to parse them in the dataframe.由于技术原因,我无法解析操纵 csv 的 url,但我必须在 dataframe 中解析它们。 When I execute this code, I have the following error that I don't really understand:当我执行此代码时,出现以下我不太明白的错误:

File "/Users/adamn/Desktop/test_lambda.py", line 33, in <module>obj = urlparse(df['url'])
File"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py", line 389, in urlparse
    url, scheme, _coerce_result = _coerce_args(url, scheme)
File"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py", line 125, in _coerce_args
    return _decode_args(args) + (_encode_result,)
File"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py", line 109, in _decode_args
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
File"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py", line 109, in <genexpr>
    return tuple(x.decode(encoding, errors) if x else '' for x in args)
File"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pandas/core/generic.py", line 1442, in __nonzero__
    raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I do get there is an error so what am I doing that is not possible to do?我确实有一个错误,所以我在做什么是不可能的? And how can I resolve it or just use another way to get this done?我该如何解决它或只是使用另一种方式来完成它?

Thanks for helping.感谢您的帮助。

urlparse only takes one string at a time, not a series. urlparse 一次只接受一个字符串,而不是一个系列。

try:尝试:

df["URL"] =df["URL"].astype(str).apply(lambda x: urlparse(x).path)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM