简体   繁体   中英

Select two pandas columns that have the same digit in the column name

I am cleaning up some data and the raw dataset has entries as ['Field1.1', 'Field2.1', 'Field1.2', 'Field2.2'] . For the dataset, either 'Field1' XOR 'Field2' will have a non-empty string. I'd like to create a single field 'Field.1' that will extract the non-empty string from 'Field1.1' XOR 'Field2.1' and place it in 'Field.1' . Similarly, I'd like to do this for 'Field1.2' and 'Field2.2' as 'Field.2' .

I am not sure how to select matching fields, ie 'X.1' with 'Y.1' and 'X.2' with 'Y.2', in order to do this.


My logic is that once I can select the correct pairs I can simply use a concat statement to add them and thereby extract the non-empty string for later use. If this logic is incorrect or there is a better way that does not rely on extracting the non-empty string in this way to concatenate them then please let me know.


If the logic is sound, please explain how might do this extraction given the indexing problem.


To be clearer, I want to go from:

df = pd.DataFrame({'field1.1': ['string1',''], 'field2.1':['','string2'],
                   'field1.2': ['string3',''], 'field2.2':['','string4']})

df

Out[1]: 
  field1.1 field2.1 field1.2 field2.2
0  string1           string2         
1           string3           string4

to:

df2 = pd.DataFrame({'field.1': ['string1','string3'], 'field.2':['string2','string4']})

df2

Out[2]: 
   field.1  field.2
0  string1  string2
1  string3  string4

You can use wide_to_long , bfill , and then pivot back:

(pd.wide_to_long(df.where(df.ne('')).reset_index(), 
                 stubnames=['Field1','Field2'],
                 i='index', 
                 j='group', 
                 sep='.')
 .bfill(1)
 .reset_index()
 .pivot(values='Field1',index='index',columns='group')
)

Sample data:

df = pd.DataFrame([
    ['a','','b',''],
    ['c','','','d'],
    ['','e','','f'],
    ['','g','h','']],
    columns=['Field1.1', 'Field2.1', 'Field1.2', 'Field2.2'])

group  1  2
index      
0      a  b
1      c  d
2      e  f
3      g  h

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM