简体   繁体   中英

Equivalent of Paste R to Python

I am a new python afficionado. For R users, there is one function: paste that helps to concatenate two or more variables in a dataframe. It's very useful. For example Suppose that I have this dataframe:

   categorie titre tarifMin  lieu  long   lat   img dateSortie
1      zoo,  Aquar      0.0 Aquar 2.385 48.89 ilo,0           
2      zoo,  Aquar      4.5 Aquar 2.408 48.83 ilo,0           
6      lieu  Jardi      0.0 Jardi 2.320 48.86 ilo,0           
7      lieu  Bois       0.0 Bois  2.455 48.82 ilo,0           
13     espac Canal      0.0 Canal 2.366 48.87 ilo,0           
14     espac Canal     -1.0 Canal 2.384 48.89 ilo,0           
15     parc  Le Ma     20.0 Le Ma 2.353 48.87 ilo,0 

I want to create a new column which uses another column in a dataframe and some text. With R, I do:

> y$thecolThatIWant=ifelse(y$tarifMin!=-1,
+                             paste("Evenement permanent  -->",y$categorie,
+                                   y$titre,"C  partir de",y$tarifMin,"€uros"),
+                             paste("Evenement permanent  -->",y$categorie,
+                                   y$titre,"sans prix indique"))

And the result is:

> y
   categorie titre tarifMin  lieu  long   lat   img dateSortie
1      zoo,  Aquar      0.0 Aquar 2.385 48.89 ilo,0           
2      zoo,  Aquar      4.5 Aquar 2.408 48.83 ilo,0           
6      lieu  Jardi      0.0 Jardi 2.320 48.86 ilo,0           
7      lieu  Bois       0.0 Bois  2.455 48.82 ilo,0           
13     espac Canal      0.0 Canal 2.366 48.87 ilo,0           
14     espac Canal     -1.0 Canal 2.384 48.89 ilo,0           
15     parc  Le Ma     20.0 Le Ma 2.353 48.87 ilo,0           
                                                thecolThatIWant
1  Evenement permanent  --> zoo,  Aquar C  partir de  0.0 €uros
2  Evenement permanent  --> zoo,  Aquar C  partir de  4.5 €uros
6  Evenement permanent  --> lieu  Jardi C  partir de  0.0 €uros
7  Evenement permanent  --> lieu  Bois  C  partir de  0.0 €uros
13 Evenement permanent  --> espac Canal C  partir de  0.0 €uros
14 Evenement permanent  --> espac Canal C  partir de -1.0 €uros
15 Evenement permanent  --> parc  Le Ma C  partir de 20.0 €uros

My question is: How can I do the same thing in Python Pandas or some other module?

What I've tried so far: Well, I'm a very new user. So sorry for my mistake. I try to replicate the example in Python and we suppose that I get something like this

table=pd.read_csv("y.csv",sep=",")
tt= table.loc[:,['categorie','titre','tarifMin','long','lat','lieu']]
table
ategorie    titre   tarifMin    long    lat     lieu
0   zoo,    Aquar   0.0     2.385   48.89   Aquar
1   zoo,    Aquar   4.5     2.408   48.83   Aquar
2   lieu    Jardi   0.0     2.320   48.86   Jardi
3   lieu    Bois    0.0     2.455   48.82   Bois
4   espac   Canal   0.0     2.366   48.87   Canal
5   espac   Canal   -1.0    2.384   48.89   Canal
6   parc    Le Ma   20.0    2.353   48.87   Le Ma

I tried this basically

sc="Even permanent -->" + " "+ tt.titre+" "+tt.lieu
tt['theColThatIWant'] = sc
tt

And I got this

    categorie   titre   tarifMin    long    lat     lieu    theColThatIWant
0   zoo,    Aquar   0.0     2.385   48.89   Aquar   Even permanent --> Aquar Aquar
1   zoo,    Aquar   4.5     2.408   48.83   Aquar   Even permanent --> Aquar Aquar
2   lieu    Jardi   0.0     2.320   48.86   Jardi   Even permanent --> Jardi Jardi
3   lieu    Bois    0.0     2.455   48.82   Bois    Even permanent --> Bois Bois
4   espac   Canal   0.0     2.366   48.87   Canal   Even permanent --> Canal Canal
5   espac   Canal   -1.0    2.384   48.89   Canal   Even permanent --> Canal Canal
6   parc    Le Ma   20.0    2.353   48.87   Le Ma   Even permanent --> Le Ma Le Ma

Now, I suppose that I have to loop with condition if there is no vectorize like in R?

This very much works like Paste command in R: R code:

 words = c("Here", "I","want","to","concatenate","words","using","pipe","delimeter")
 paste(words,collapse="|")

[1]

"Here|I|want|to|concatenate|words|using|pipe|delimeter"

Python:

words = ["Here", "I","want","to","concatenate","words","using","pipe","delimeter"]
"|".join(words)

Result:

'Here|I|want|to|concatenate|words|using|pipe|delimeter'

Here's a simple implementation that works on lists, and probably other iterables. Warning: it's only been lightly tested, and only in Python 3.5+:

from functools import reduce

def _reduce_concat(x, sep=""):
    return reduce(lambda x, y: str(x) + sep + str(y), x)
        
def paste(*lists, sep=" ", collapse=None):
    result = map(lambda x: _reduce_concat(x, sep=sep), zip(*lists))
    if collapse is not None:
        return _reduce_concat(result, sep=collapse)
    return list(result)

assert paste([1,2,3], [11,12,13], sep=',') == ['1,11', '2,12', '3,13']
assert paste([1,2,3], [11,12,13], sep=',', collapse=";") == '1,11;2,12;3,13'

You can also have some more fun and replicate other functions like paste0 :

from functools import partial

paste0 = partial(paste, sep="")

Edit: here's a Repl.it project with type-annotated versions of this code.

For this particular case, the paste operator in R is closest to Python's format which was added in Python 2.6. It's newer and somewhat more flexible than the older % operator.

For a purely Python-ic answer without using numpy or pandas, here is one way to do it using your original data in the form of a list of lists (this could also have been done as a list of dict, but that seemed more cluttered to me).

# -*- coding: utf-8 -*-
names=['categorie','titre','tarifMin','lieu','long','lat','img','dateSortie']

records=[[
    'zoo',   'Aquar',     0.0,'Aquar',2.385,48.89,'ilo',0],[
    'zoo',   'Aquar',     4.5,'Aquar',2.408,48.83,'ilo',0],[
    'lieu',  'Jardi',     0.0,'Jardi',2.320,48.86,'ilo',0],[
    'lieu',  'Bois',      0.0,'Bois', 2.455,48.82,'ilo',0],[
    'espac', 'Canal',     0.0,'Canal',2.366,48.87,'ilo',0],[
    'espac', 'Canal',    -1.0,'Canal',2.384,48.89,'ilo',0],[
    'parc',  'Le Ma',    20.0,'Le Ma', 2.353,48.87,'ilo',0] ]

def prix(p):
    if (p != -1):
        return 'C  partir de {} €uros'.format(p)
    return 'sans prix indique'

def msg(a):
    return 'Evenement permanent  --> {}, {} {}'.format(a[0],a[1],prix(a[2]))

[m.append(msg(m)) for m in records]

from pprint import pprint

pprint(records)

The result is this:

[['zoo',
  'Aquar',
  0.0,
  'Aquar',
  2.385,
  48.89,
  'ilo',
  0,
  'Evenement permanent  --> zoo, Aquar C  partir de 0.0 \xe2\x82\xacuros'],
 ['zoo',
  'Aquar',
  4.5,
  'Aquar',
  2.408,
  48.83,
  'ilo',
  0,
  'Evenement permanent  --> zoo, Aquar C  partir de 4.5 \xe2\x82\xacuros'],
 ['lieu',
  'Jardi',
  0.0,
  'Jardi',
  2.32,
  48.86,
  'ilo',
  0,
  'Evenement permanent  --> lieu, Jardi C  partir de 0.0 \xe2\x82\xacuros'],
 ['lieu',
  'Bois',
  0.0,
  'Bois',
  2.455,
  48.82,
  'ilo',
  0,
  'Evenement permanent  --> lieu, Bois C  partir de 0.0 \xe2\x82\xacuros'],
 ['espac',
  'Canal',
  0.0,
  'Canal',
  2.366,
  48.87,
  'ilo',
  0,
  'Evenement permanent  --> espac, Canal C  partir de 0.0 \xe2\x82\xacuros'],
 ['espac',
  'Canal',
  -1.0,
  'Canal',
  2.384,
  48.89,
  'ilo',
  0,
  'Evenement permanent  --> espac, Canal sans prix indique'],
 ['parc',
  'Le Ma',
  20.0,
  'Le Ma',
  2.353,
  48.87,
  'ilo',
  0,
  'Evenement permanent  --> parc, Le Ma C  partir de 20.0 \xe2\x82\xacuros']]

Note that although I've defined a list names it isn't actually used. One could define a dictionary with the names of the titles as the key and the field number (starting from 0) as the value, but I didn't bother with this to try to keep the example simple.

The functions prix and msg are fairly simple. The only tricky portion is the list comprehension [m.append(msg(m)) for m in records] which iterates through all of the records, and modifies each to append your new field, created via a call to msg .

my anwser is loosely based on original question, was edited from answer by woles. I would like to illustrate the points:

  • paste is % operator in python
  • using apply you can make new value and assign it to new column

for R folks: there is no ifelse in direct form (but there are ways to nicely replace it).

import numpy as np
import pandas as pd

dates = pd.date_range('20140412',periods=7)
df = pd.DataFrame(np.random.randn(7,4),index=dates,columns=list('ABCD'))
df['categorie'] = ['z', 'z', 'l', 'l', 'e', 'e', 'p']

def apply_to_row(x):
    ret = "this is the value i want: %f" % x['A']
    if x['B'] > 0:
        ret = "no, this one is better: %f" % x['C']
    return ret

df['theColumnIWant'] = df.apply(apply_to_row, axis = 1)
print df
  1. You can try pandas.Series.str.cat

     import pandas as pd def paste0(ss,sep=None,na_rep=None,): '''Analogy to R paste0''' ss = [pd.Series(s) for s in ss] ss = [s.astype(str) for s in ss] s = ss[0] res = s.str.cat(ss[1:],sep=sep,na_rep=na_rep) return res pasteA=paste0
  2. Or just sep.join()

    #
     def paste0(ss,sep=None,na_rep=None, castF=unicode, ##### many languages dont work well with str ): if sep is None: sep='' res = [castF(sep).join(castF(s) for s in x) for x in zip(*ss)] return res pasteB = paste0 %timeit pasteA([range(1000),range(1000,0,-1)],sep='_') # 100 loops, best of 3: 7.11 ms per loop %timeit pasteB([range(1000),range(1000,0,-1)],sep='_') # 100 loops, best of 3: 2.24 ms per loop
  3. I have used itertools to mimic recycling

    import itertools def paste0(ss,sep=None,na_rep=None,castF=unicode): '''Analogy to R paste0 ''' if sep is None: sep=u'' L = max([len(e) for e in ss]) it = itertools.izip(*[itertools.cycle(e) for e in ss]) res = [castF(sep).join(castF(s) for s in next(it) ) for i in range(L)] # res = pd.Series(res) return res
  4. patsy might be relevant (not an experienced user myself.)

Let's try things with apply.

df.apply( lambda x: str( x.loc[ desired_col ] ) + "pasting?" , axis = 1 )

you will recevied things similar like paste

If you want to just paste two string columns together, you can simplify @shouldsee's answer because you don't need to create the function. Eg, in my case:

df['newcol'] = df['id_part_one'].str.cat(df['id_part_two'], sep='_')

It might be required for both Series to be of dtype object in order to this (I haven't verified).

This is simple example how to achive that (If I'am not worng what do you want to do):

import numpy as np
import pandas as pd

dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
for row in df.itertuples():
    index, A, B, C, D = row
    print '%s Evenement permanent  --> %s , next data %s' % (index, A, B)

Output:

>>>df
                   A         B         C         D
2013-01-01 -0.400550 -0.204032 -0.954237  0.019025
2013-01-02  0.509040 -0.611699  1.065862  0.034486
2013-01-03  0.366230  0.805068 -0.144129 -0.912942
2013-01-04  1.381278 -1.783794  0.835435 -0.140371
2013-01-05  1.140866  2.755003 -0.940519 -2.425671
2013-01-06 -0.610569 -0.282952  0.111293 -0.108521

This what loop for print: 2013-01-01 00:00:00 Evenement permanent --> -0.400550121168 , next data -0.204032344442

2013-01-02 00:00:00 Evenement permanent  --> 0.509040318928 , next data -0.611698560541

2013-01-03 00:00:00 Evenement permanent  --> 0.366230438863 , next data 0.805067758304

2013-01-04 00:00:00 Evenement permanent  --> 1.38127775713 , next data -1.78379439485

2013-01-05 00:00:00 Evenement permanent  --> 1.14086631509 , next data 2.75500268167

2013-01-06 00:00:00 Evenement permanent  --> -0.610568516983 , next data -0.282952162792

There is actually a very easy way. You just convert your variable to a string . For instance, try to run this:

a = 1; b = "you are number " + str(a); b

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM