[英]In pandas how to create a new column from part of another, obeying a condition?
In python 3 and pandas I have the dataframe: 在python 3和pandas中,我有数据框:
lista_projetos.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59 entries, 0 to 58
Data columns (total 14 columns):
n_projeto 59 non-null object
autor 59 non-null object
ementa 59 non-null object
resumo 59 non-null object
votacao_nominal 59 non-null object
votacao_nominal_alternativa_emenda 59 non-null object
link_votacao 0 non-null float64
observacao 0 non-null float64
link_emenda 0 non-null float64
indicado_por 59 non-null object
entidade_que_avalia 59 non-null object
favoravel_desfavoravel_indiferente 59 non-null object
explicacao 59 non-null object
link_projeto 59 non-null object
dtypes: float64(3), object(11)
memory usage: 6.5+ KB
The column "link_projeto" has urls, always in this format: “ link_projeto”列中的网址始终采用以下格式:
" http://www.camara.gov.br/proposicoesWeb/fichadetramitacao?idProposicao=2171854 " “ http://www.camara.gov.br/proposicoesWeb/fichadetramitacao?idProposicao=2171854 ”
" http://www.camara.gov.br/proposicoesWeb/fichadetramitacao?idProposicao=2147513 " “ http://www.camara.gov.br/proposicoesWeb/fichadetramitacao?idProposicao=2147513 ”
" http://www.camara.gov.br/proposicoesWeb/fichadetramitacao?idProposicao=2168253 " “ http://www.camara.gov.br/proposicoesWeb/fichadetramitacao?idProposicao=2168253 ”
I want to create a new column from the "link_projeto" column. 我想从“ link_projeto”列中创建一个新列。 So: always pick up the final number after the "=" sign 因此:请务必在“ =”符号后选择最终数字
Like this: 像这样:
new_column
2171854
2147513
2168253
Please, is there a way to create a new column from part of another? 请问,有没有一种方法可以从另一部分中创建一个新列?
First, how would you do this on a single value? 首先,您将如何对单个值执行此操作?
>>> link = "http://www.camara.gov.br/proposicoesWeb/fichadetramitacao?idProposicao=2171854"
>>> link.split("=", 1)[1]
'2171854'
But split
is a method on str
objects; 但是split
是对str
对象的一种方法。 how do you apply it to a column full of strings? 如何将其应用于充满字符串的列? Simple: columns (Series and Index) have a str
attribute for exactly this purpose: 简单:列(“系列”和“索引”)具有str
属性 ,正是出于这个目的:
df.link_projecto.str.split("=", 1)
But split
doesn't just return a string, it returns a list of strings. 但是split
不仅返回字符串,还返回字符串列表。 How do we get the last one? 我们如何获得最后一个?
As explained in Splitting and Replacing Strings , you just access str
again and index it: 如拆分和替换字符串中所述 ,您只需再次访问str
并对其进行索引:
df.link_projecto.str.split("=", 1).str[1]
So: 所以:
df["new_column"] = df.link_projecto.str.split("=", 1).str[1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.