简体   繁体   中英

Pandas Assign column by partial string match size to array dimension error

I have a dataframe as such:

  Postcode         Country
0  PR2 6AS  United Kingdom
1  PR2 6AS  United Kingdom
2  CF5 3EG  United Kingdom
3  DG2 9FH  United Kingdom

I create a new column to be assigned based on a partial string match:

mytestdf['In_Preston'] = "FALSE"

mytestdf

  Postcode         Country In_Preston
0  PR2 6AS  United Kingdom      FALSE
1  PR2 6AS  United Kingdom      FALSE
2  CF5 3EG  United Kingdom      FALSE
3  DG2 9FH  United Kingdom      FALSE

I wish to assign the column "In_Preston" by a partial string match on "Postcode". I try the following:

mytestdf.loc[(mytestdf[mytestdf['Postcode'].str.contains("PR2")]), 'In_Preston'] = "TRUE"

But this returns the error "cannot copy sequence with size 3 to array axis with dimension 2"

I look at my code again and believe the issue is that I am selecting a slice of the dataframe from a slice of the dataframe. As such I change to

mytestdf.loc[(mytestdf['Postcode'].str.contains("PR2")]), 'In_Preston'] = "TRUE"

but my interpreter tells me this is incorrect syntax, though I do not see why.

What is the error in my code or my approach?

You need remove inner filter:

mytestdf.loc[mytestdf['Postcode'].str.contains("PR2"), 'In_Preston'] = "TRUE"

Another solution is use numpy.where :

mytestdf['In_Preston'] = np.where(mytestdf['Postcode'].str.contains("PR2"), 'TRUE', 'FALSE')
print (mytestdf)
  Postcode         Country In_Preston
0  PR2 6AS  United Kingdom       TRUE
1  PR2 6AS  United Kingdom       TRUE
2  CF5 3EG  United Kingdom      FALSE
3  DG2 9FH  United Kingdom      FALSE

But if want assign boolean True s and False s:

mytestdf['In_Preston'] = mytestdf['Postcode'].str.contains("PR2")
print (mytestdf)
  Postcode         Country  In_Preston
0  PR2 6AS  United Kingdom        True
1  PR2 6AS  United Kingdom        True
2  CF5 3EG  United Kingdom       False
3  DG2 9FH  United Kingdom       False

EDIT by comment of Zero :

If want check only start of Postcode :

mytestdf.Postcode.str.startswith('PR2')

Or add regex ^ for start of string:

mytestdf['Postcode'].str.contains("^PR2")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM