简体   繁体   中英

In pandas how to create a new column from part of another, obeying a condition?

In python 3 and pandas I have the dataframe:

lista_projetos.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59 entries, 0 to 58
Data columns (total 14 columns):
n_projeto                             59 non-null object
autor                                 59 non-null object
ementa                                59 non-null object
resumo                                59 non-null object
votacao_nominal                       59 non-null object
votacao_nominal_alternativa_emenda    59 non-null object
link_votacao                          0 non-null float64
observacao                            0 non-null float64
link_emenda                           0 non-null float64
indicado_por                          59 non-null object
entidade_que_avalia                   59 non-null object
favoravel_desfavoravel_indiferente    59 non-null object
explicacao                            59 non-null object
link_projeto                          59 non-null object
dtypes: float64(3), object(11)
memory usage: 6.5+ KB

The column "link_projeto" has urls, always in this format:

" http://www.camara.gov.br/proposicoesWeb/fichadetramitacao?idProposicao=2171854 "

" http://www.camara.gov.br/proposicoesWeb/fichadetramitacao?idProposicao=2147513 "

" http://www.camara.gov.br/proposicoesWeb/fichadetramitacao?idProposicao=2168253 "

I want to create a new column from the "link_projeto" column. So: always pick up the final number after the "=" sign

Like this:

new_column
2171854
2147513
2168253

Please, is there a way to create a new column from part of another?

First, how would you do this on a single value?

>>> link = "http://www.camara.gov.br/proposicoesWeb/fichadetramitacao?idProposicao=2171854"
>>> link.split("=", 1)[1]
'2171854'

But split is a method on str objects; how do you apply it to a column full of strings? Simple: columns (Series and Index) have a str attribute for exactly this purpose:

df.link_projecto.str.split("=", 1)

But split doesn't just return a string, it returns a list of strings. How do we get the last one?

As explained in Splitting and Replacing Strings , you just access str again and index it:

df.link_projecto.str.split("=", 1).str[1]

So:

df["new_column"] = df.link_projecto.str.split("=", 1).str[1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM