从字符串中提取 substring 并应用于整个 dataframe 列

Question

I have a pandas dataframe with a bunch of urls in a column, eg我有一个 pandas dataframe 在列中有一堆网址，例如

URL
www.myurl.com/python/us/learnpython
www.myurl.com/python/en/learnpython
www.myurl.com/python/fr/learnpython
.........

I want to extract the country code and add them in to a new column called Country containing us, en, fr and so on.我想提取国家代码并将它们添加到一个名为 Country 的新列中，其中包含我们、en、fr 等。 I'm able to do this on a single string, eg我可以在单个字符串上执行此操作，例如

url = 'www.myurl.com/python/us/learnpython'
country = url.split("python/")
country = country[1]
country = country.split("/")
country = country[0]

How do I go about applying this to the entire column, creating a new column with the required data in the process?我如何 go 将其应用于整个列，在此过程中创建一个包含所需数据的新列？ I've tried variations of this with a for loop without success.我用 for 循环尝试了这种变化，但没有成功。

Answer 1

Assuming the URLs would always have this format, we can just use str.extract here:假设 URL 总是有这种格式，我们可以在这里使用str.extract ：

df["cc_code"] = df["URL"].str.extract(r'/([a-z]{2})/')

Answer 2

If the contry code always appears after second slash / , its better to just split the string passing value for n ie maxsplit parameter and take only the value you are interested in. Of course, you can assign the values to a new column:如果 contry 代码总是出现在第二个斜杠/之后，最好将字符串传递值拆分为n即 maxsplit 参数并只取您感兴趣的值。当然，您可以将值分配给新列：

>>> df['URL'].str.split('/',n=2).str[-1].str.split('/', n=1).str[0]

0    us
1    en
2    fr
Name: URL, dtype: object

从字符串中提取 substring 并应用于整个 dataframe 列

问题描述

2 个解决方案

解决方案1
0 已采纳 2022-09-17 12:51:01

解决方案2
0 2022-09-17 13:04:05

从字符串中提取 substring 并应用于整个 dataframe 列

问题描述

2 个解决方案

解决方案1 0 已采纳 2022-09-17 12:51:01

解决方案2 0 2022-09-17 13:04:05

解决方案1
0 已采纳 2022-09-17 12:51:01

解决方案2
0 2022-09-17 13:04:05