[英]Extract substring from string and apply to entire dataframe column
I have a pandas dataframe with a bunch of urls in a column, eg我有一个 pandas dataframe 在列中有一堆网址,例如
URL
www.myurl.com/python/us/learnpython
www.myurl.com/python/en/learnpython
www.myurl.com/python/fr/learnpython
.........
I want to extract the country code and add them in to a new column called Country containing us, en, fr and so on.我想提取国家代码并将它们添加到一个名为 Country 的新列中,其中包含我们、en、fr 等。 I'm able to do this on a single string, eg
我可以在单个字符串上执行此操作,例如
url = 'www.myurl.com/python/us/learnpython'
country = url.split("python/")
country = country[1]
country = country.split("/")
country = country[0]
How do I go about applying this to the entire column, creating a new column with the required data in the process?我如何 go 将其应用于整个列,在此过程中创建一个包含所需数据的新列? I've tried variations of this with a for loop without success.
我用 for 循环尝试了这种变化,但没有成功。
Assuming the URLs would always have this format, we can just use str.extract
here:假设 URL 总是有这种格式,我们可以在这里使用
str.extract
:
df["cc_code"] = df["URL"].str.extract(r'/([a-z]{2})/')
If the contry code always appears after second slash /
, its better to just split the string passing value for n
ie maxsplit parameter and take only the value you are interested in. Of course, you can assign the values to a new column:如果 contry 代码总是出现在第二个斜杠
/
之后,最好将字符串传递值拆分为n
即 maxsplit 参数并只取您感兴趣的值。当然,您可以将值分配给新列:
>>> df['URL'].str.split('/',n=2).str[-1].str.split('/', n=1).str[0]
0 us
1 en
2 fr
Name: URL, dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.