using python, pandas
I have a dataframe with three columns and about a million rows. The third column contains strings. I want to select a subset of these strings that match the strings in a list and put them in a fourth column.
Here is an example of a string from the dataframe:
"BW - Jl 8 '79 - pE2 CCB-B -vl9-Ja '66-p83 LJ - v91 - Ja 15 -66 - p426
NYRB - v5 - D 9 '65 - p39 NYTBR - v70 - N 21 '65 - p60 Nat R - vl7 -
D14 '65-pll65 y"
Here is a sample of my list:
['AAA', 'A Anth', 'AAPSS-A', 'A Anth', 'A Arch', 'A Art', 'AB', 'ABA
Jour', 'ABC', 'ABR', 'AC', 'ACSB', 'Adult L', 'Advocate', 'AE', 'AER',
'AF', 'Africa T', 'Afterimage', 'Aging', 'AH', 'AHR', 'A Hy R', 'AIQ',
'AJA', 'AJES', 'AJMD', 'AJMR', 'AJP', 'A J Psy', 'AJS', 'AL', 'A Lead',
'A Lib', 'Am', 'Am Ant', 'Am Arts', 'Am Craft', 'Amer R', 'Am Ethol',
'Am Film', 'Am Mus Teach', 'Am Q', 'Ams', 'Am Sci', 'Am Spect', 'Am
Threat', 'Analog', 'ANQ', 'ANQ:QJ', 'Ant & Col Hob', 'Antiq', 'Antiq
J', 'Ant R', 'Apo', 'APR', 'APSR', 'AR', 'ARBA', 'Arch', 'Archt R',
'ARG', 'Armchair Det', 'Art Am', 'Art Bull', 'Art Dir', 'Art J', 'Art
N', 'AS', 'ASBYP', 'Aspen A', 'Aspen J', 'ASR', 'Astron', 'Ath J',
'Atl', 'Atl Pro Bk R', 'Atl PBR', 'Aud', 'AW', 'BALF', 'Ballet N',
"Barron's", 'BAS', 'BB', 'B&B', 'BC', 'BCM', 'B Ent', 'Belles Let',
'BF', 'BFYC', 'B Hor', 'BHR', 'BIC', 'Biography', 'BksW', 'Bks for
Keeps', 'Bks for YP', 'BL', 'Bloom Rev']
From the string in the dataframe, I want to select 'BW', 'CCB-B', 'LJ', 'NYRB', 'NYTRB', and 'Nat R', (all of which are in the list) and put them in a new column in the same row.
My code looks like this:
s = df65['Review'].str.extractall(reviews_list).squeeze()
s = s.unstack(level=-1)
df65['Reviews'] = s
But extractall doesn't take lists as arguments in this way.
Help?
str.extractall
expects a regex pattern as a parameter. You can make this regex with
'|'.join(reviews_list)
But some characters need to be escaped to be used with regex, so import re
and use re.escape
like this:
[re.escape(item) for item in reviews_list]
So your new call will be
s = df65['Review'].str.extractall('|'.join([re.escape(item) for item in reviews_list])).squeeze()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.