I am a new python afficionado. For R users, there is one function: paste that helps to concatenate two or more variables in a dataframe. It's very useful. For example Suppose that I have this dataframe:
categorie titre tarifMin lieu long lat img dateSortie
1 zoo, Aquar 0.0 Aquar 2.385 48.89 ilo,0
2 zoo, Aquar 4.5 Aquar 2.408 48.83 ilo,0
6 lieu Jardi 0.0 Jardi 2.320 48.86 ilo,0
7 lieu Bois 0.0 Bois 2.455 48.82 ilo,0
13 espac Canal 0.0 Canal 2.366 48.87 ilo,0
14 espac Canal -1.0 Canal 2.384 48.89 ilo,0
15 parc Le Ma 20.0 Le Ma 2.353 48.87 ilo,0
I want to create a new column which uses another column in a dataframe and some text. With R, I do:
> y$thecolThatIWant=ifelse(y$tarifMin!=-1,
+ paste("Evenement permanent -->",y$categorie,
+ y$titre,"C partir de",y$tarifMin,"€uros"),
+ paste("Evenement permanent -->",y$categorie,
+ y$titre,"sans prix indique"))
And the result is:
> y
categorie titre tarifMin lieu long lat img dateSortie
1 zoo, Aquar 0.0 Aquar 2.385 48.89 ilo,0
2 zoo, Aquar 4.5 Aquar 2.408 48.83 ilo,0
6 lieu Jardi 0.0 Jardi 2.320 48.86 ilo,0
7 lieu Bois 0.0 Bois 2.455 48.82 ilo,0
13 espac Canal 0.0 Canal 2.366 48.87 ilo,0
14 espac Canal -1.0 Canal 2.384 48.89 ilo,0
15 parc Le Ma 20.0 Le Ma 2.353 48.87 ilo,0
thecolThatIWant
1 Evenement permanent --> zoo, Aquar C partir de 0.0 €uros
2 Evenement permanent --> zoo, Aquar C partir de 4.5 €uros
6 Evenement permanent --> lieu Jardi C partir de 0.0 €uros
7 Evenement permanent --> lieu Bois C partir de 0.0 €uros
13 Evenement permanent --> espac Canal C partir de 0.0 €uros
14 Evenement permanent --> espac Canal C partir de -1.0 €uros
15 Evenement permanent --> parc Le Ma C partir de 20.0 €uros
My question is: How can I do the same thing in Python Pandas or some other module?
What I've tried so far: Well, I'm a very new user. So sorry for my mistake. I try to replicate the example in Python and we suppose that I get something like this
table=pd.read_csv("y.csv",sep=",")
tt= table.loc[:,['categorie','titre','tarifMin','long','lat','lieu']]
table
ategorie titre tarifMin long lat lieu
0 zoo, Aquar 0.0 2.385 48.89 Aquar
1 zoo, Aquar 4.5 2.408 48.83 Aquar
2 lieu Jardi 0.0 2.320 48.86 Jardi
3 lieu Bois 0.0 2.455 48.82 Bois
4 espac Canal 0.0 2.366 48.87 Canal
5 espac Canal -1.0 2.384 48.89 Canal
6 parc Le Ma 20.0 2.353 48.87 Le Ma
I tried this basically
sc="Even permanent -->" + " "+ tt.titre+" "+tt.lieu
tt['theColThatIWant'] = sc
tt
And I got this
categorie titre tarifMin long lat lieu theColThatIWant
0 zoo, Aquar 0.0 2.385 48.89 Aquar Even permanent --> Aquar Aquar
1 zoo, Aquar 4.5 2.408 48.83 Aquar Even permanent --> Aquar Aquar
2 lieu Jardi 0.0 2.320 48.86 Jardi Even permanent --> Jardi Jardi
3 lieu Bois 0.0 2.455 48.82 Bois Even permanent --> Bois Bois
4 espac Canal 0.0 2.366 48.87 Canal Even permanent --> Canal Canal
5 espac Canal -1.0 2.384 48.89 Canal Even permanent --> Canal Canal
6 parc Le Ma 20.0 2.353 48.87 Le Ma Even permanent --> Le Ma Le Ma
Now, I suppose that I have to loop with condition if there is no vectorize like in R?
This very much works like Paste command in R: R code:
words = c("Here", "I","want","to","concatenate","words","using","pipe","delimeter")
paste(words,collapse="|")
[1]
"Here|I|want|to|concatenate|words|using|pipe|delimeter"
Python:
words = ["Here", "I","want","to","concatenate","words","using","pipe","delimeter"]
"|".join(words)
Result:
'Here|I|want|to|concatenate|words|using|pipe|delimeter'
Here's a simple implementation that works on lists, and probably other iterables. Warning: it's only been lightly tested, and only in Python 3.5+:
from functools import reduce
def _reduce_concat(x, sep=""):
return reduce(lambda x, y: str(x) + sep + str(y), x)
def paste(*lists, sep=" ", collapse=None):
result = map(lambda x: _reduce_concat(x, sep=sep), zip(*lists))
if collapse is not None:
return _reduce_concat(result, sep=collapse)
return list(result)
assert paste([1,2,3], [11,12,13], sep=',') == ['1,11', '2,12', '3,13']
assert paste([1,2,3], [11,12,13], sep=',', collapse=";") == '1,11;2,12;3,13'
You can also have some more fun and replicate other functions like paste0
:
from functools import partial
paste0 = partial(paste, sep="")
Edit: here's a Repl.it project with type-annotated versions of this code.
For this particular case, the paste
operator in R
is closest to Python's format
which was added in Python 2.6. It's newer and somewhat more flexible than the older %
operator.
For a purely Python-ic answer without using numpy or pandas, here is one way to do it using your original data in the form of a list of lists (this could also have been done as a list of dict, but that seemed more cluttered to me).
# -*- coding: utf-8 -*-
names=['categorie','titre','tarifMin','lieu','long','lat','img','dateSortie']
records=[[
'zoo', 'Aquar', 0.0,'Aquar',2.385,48.89,'ilo',0],[
'zoo', 'Aquar', 4.5,'Aquar',2.408,48.83,'ilo',0],[
'lieu', 'Jardi', 0.0,'Jardi',2.320,48.86,'ilo',0],[
'lieu', 'Bois', 0.0,'Bois', 2.455,48.82,'ilo',0],[
'espac', 'Canal', 0.0,'Canal',2.366,48.87,'ilo',0],[
'espac', 'Canal', -1.0,'Canal',2.384,48.89,'ilo',0],[
'parc', 'Le Ma', 20.0,'Le Ma', 2.353,48.87,'ilo',0] ]
def prix(p):
if (p != -1):
return 'C partir de {} €uros'.format(p)
return 'sans prix indique'
def msg(a):
return 'Evenement permanent --> {}, {} {}'.format(a[0],a[1],prix(a[2]))
[m.append(msg(m)) for m in records]
from pprint import pprint
pprint(records)
The result is this:
[['zoo',
'Aquar',
0.0,
'Aquar',
2.385,
48.89,
'ilo',
0,
'Evenement permanent --> zoo, Aquar C partir de 0.0 \xe2\x82\xacuros'],
['zoo',
'Aquar',
4.5,
'Aquar',
2.408,
48.83,
'ilo',
0,
'Evenement permanent --> zoo, Aquar C partir de 4.5 \xe2\x82\xacuros'],
['lieu',
'Jardi',
0.0,
'Jardi',
2.32,
48.86,
'ilo',
0,
'Evenement permanent --> lieu, Jardi C partir de 0.0 \xe2\x82\xacuros'],
['lieu',
'Bois',
0.0,
'Bois',
2.455,
48.82,
'ilo',
0,
'Evenement permanent --> lieu, Bois C partir de 0.0 \xe2\x82\xacuros'],
['espac',
'Canal',
0.0,
'Canal',
2.366,
48.87,
'ilo',
0,
'Evenement permanent --> espac, Canal C partir de 0.0 \xe2\x82\xacuros'],
['espac',
'Canal',
-1.0,
'Canal',
2.384,
48.89,
'ilo',
0,
'Evenement permanent --> espac, Canal sans prix indique'],
['parc',
'Le Ma',
20.0,
'Le Ma',
2.353,
48.87,
'ilo',
0,
'Evenement permanent --> parc, Le Ma C partir de 20.0 \xe2\x82\xacuros']]
Note that although I've defined a list names
it isn't actually used. One could define a dictionary with the names of the titles as the key and the field number (starting from 0) as the value, but I didn't bother with this to try to keep the example simple.
The functions prix
and msg
are fairly simple. The only tricky portion is the list comprehension [m.append(msg(m)) for m in records]
which iterates through all of the records, and modifies each to append your new field, created via a call to msg
.
my anwser is loosely based on original question, was edited from answer by woles. I would like to illustrate the points:
for R folks: there is no ifelse in direct form (but there are ways to nicely replace it).
import numpy as np
import pandas as pd
dates = pd.date_range('20140412',periods=7)
df = pd.DataFrame(np.random.randn(7,4),index=dates,columns=list('ABCD'))
df['categorie'] = ['z', 'z', 'l', 'l', 'e', 'e', 'p']
def apply_to_row(x):
ret = "this is the value i want: %f" % x['A']
if x['B'] > 0:
ret = "no, this one is better: %f" % x['C']
return ret
df['theColumnIWant'] = df.apply(apply_to_row, axis = 1)
print df
You can try pandas.Series.str.cat
import pandas as pd def paste0(ss,sep=None,na_rep=None,): '''Analogy to R paste0''' ss = [pd.Series(s) for s in ss] ss = [s.astype(str) for s in ss] s = ss[0] res = s.str.cat(ss[1:],sep=sep,na_rep=na_rep) return res pasteA=paste0
Or just sep.join()
def paste0(ss,sep=None,na_rep=None, castF=unicode, ##### many languages dont work well with str ): if sep is None: sep='' res = [castF(sep).join(castF(s) for s in x) for x in zip(*ss)] return res pasteB = paste0 %timeit pasteA([range(1000),range(1000,0,-1)],sep='_') # 100 loops, best of 3: 7.11 ms per loop %timeit pasteB([range(1000),range(1000,0,-1)],sep='_') # 100 loops, best of 3: 2.24 ms per loop
I have used itertools
to mimic recycling
import itertools def paste0(ss,sep=None,na_rep=None,castF=unicode): '''Analogy to R paste0 ''' if sep is None: sep=u'' L = max([len(e) for e in ss]) it = itertools.izip(*[itertools.cycle(e) for e in ss]) res = [castF(sep).join(castF(s) for s in next(it) ) for i in range(L)] # res = pd.Series(res) return res
patsy
might be relevant (not an experienced user myself.)
Let's try things with apply.
df.apply( lambda x: str( x.loc[ desired_col ] ) + "pasting?" , axis = 1 )
you will recevied things similar like paste
If you want to just paste two string columns together, you can simplify @shouldsee's answer because you don't need to create the function. Eg, in my case:
df['newcol'] = df['id_part_one'].str.cat(df['id_part_two'], sep='_')
It might be required for both Series to be of dtype object
in order to this (I haven't verified).
This is simple example how to achive that (If I'am not worng what do you want to do):
import numpy as np
import pandas as pd
dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
for row in df.itertuples():
index, A, B, C, D = row
print '%s Evenement permanent --> %s , next data %s' % (index, A, B)
Output:
>>>df
A B C D
2013-01-01 -0.400550 -0.204032 -0.954237 0.019025
2013-01-02 0.509040 -0.611699 1.065862 0.034486
2013-01-03 0.366230 0.805068 -0.144129 -0.912942
2013-01-04 1.381278 -1.783794 0.835435 -0.140371
2013-01-05 1.140866 2.755003 -0.940519 -2.425671
2013-01-06 -0.610569 -0.282952 0.111293 -0.108521
This what loop for print: 2013-01-01 00:00:00 Evenement permanent --> -0.400550121168 , next data -0.204032344442
2013-01-02 00:00:00 Evenement permanent --> 0.509040318928 , next data -0.611698560541
2013-01-03 00:00:00 Evenement permanent --> 0.366230438863 , next data 0.805067758304
2013-01-04 00:00:00 Evenement permanent --> 1.38127775713 , next data -1.78379439485
2013-01-05 00:00:00 Evenement permanent --> 1.14086631509 , next data 2.75500268167
2013-01-06 00:00:00 Evenement permanent --> -0.610568516983 , next data -0.282952162792
There is actually a very easy way. You just convert your variable to a string
. For instance, try to run this:
a = 1; b = "you are number " + str(a); b
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.