简体   繁体   English

从字符串中提取 substring 并应用于整个 dataframe 列

[英]Extract substring from string and apply to entire dataframe column

I have a pandas dataframe with a bunch of urls in a column, eg我有一个 pandas dataframe 在列中有一堆网址,例如

URL
www.myurl.com/python/us/learnpython
www.myurl.com/python/en/learnpython
www.myurl.com/python/fr/learnpython
.........

I want to extract the country code and add them in to a new column called Country containing us, en, fr and so on.我想提取国家代码并将它们添加到一个名为 Country 的新列中,其中包含我们、en、fr 等。 I'm able to do this on a single string, eg我可以在单个字符串上执行此操作,例如

url = 'www.myurl.com/python/us/learnpython'
country = url.split("python/")
country = country[1]
country = country.split("/")
country = country[0]

How do I go about applying this to the entire column, creating a new column with the required data in the process?我如何 go 将其应用于整个列,在此过程中创建一个包含所需数据的新列? I've tried variations of this with a for loop without success.我用 for 循环尝试了这种变化,但没有成功。

Assuming the URLs would always have this format, we can just use str.extract here:假设 URL 总是有这种格式,我们可以在这里使用str.extract

df["cc_code"] = df["URL"].str.extract(r'/([a-z]{2})/')

If the contry code always appears after second slash / , its better to just split the string passing value for n ie maxsplit parameter and take only the value you are interested in. Of course, you can assign the values to a new column:如果 contry 代码总是出现在第二个斜杠/之后,最好将字符串传递值拆分为n即 maxsplit 参数并只取您感兴趣的值。当然,您可以将值分配给新列:

>>> df['URL'].str.split('/',n=2).str[-1].str.split('/', n=1).str[0]

0    us
1    en
2    fr
Name: URL, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM