简体   繁体   中英

Restoring original order of python list after processing?

This is a toy example that will hopefully illustrate what I am trying to achieve.这是一个玩具示例,希望能说明我想要实现的目标。

I have a list of strings that I separate into two sub-lists in order to perform different preprocessing steps on. Lets say I have the following list of strings:

mylist = ['a1','a','a2','b','b2','b3','c','c1','c2']

For simplicity, I want to add a particular sub-string to the beginning of each element depending on whether that element contains a number (in reality, I have a multiple preprocessing steps that necessitates splitting the original list):

import re
withNum = [[i,'numPresent_'+i] for i in mylist if re.compile(r'\d').search(i)]
noNum = [[i,'noNum_'+i] for i in mylist if not re.compile(r'\d').search(i)]

Now that I have my two sub-lists, how can I combine them in a data-frame in a manner that they reflect their original order? Clearly, if I use df.append it will simply stack one on top of the other...

df = pd.DataFrame().append(withNum).append(noNum)

-------------------------
    0              1
-------------------------
 a1         numPresent_a1
 a2         numPresent_a2
 b2         numPresent_b2
 b3         numPresent_b3
 c1         numPresent_c1
 c2         numPresent_c2
 a          noNum_a
 b          noNum_b
 c          noNum_c
--------------------------

How can I re-order the data-frame so that it reflects the order of the original list?

-------------------------
    0              1
-------------------------
 a1         numPresent_a1
 a          noNum_a
 a2         numPresent_a2
 b          noNum_b
 b2         numPresent_b2
 b3         numPresent_b3
 c          noNum_c
 c1         numPresent_c1
 c2         numPresent_c2
--------------------------

I cannot rely on the content of the string itself to inform its position (so sorting alphabetically is out). I can only rely on its original position in the list. I'm hoping there is someway I can create an index that I can sort by after I have merged the two sub-lists.

You could modify your list comprehension as follows:

test = [[i,'numPresent_'+i] if re.compile(r'\d').search(i) else [i,'noNum_'+i] for i in mylist]
df = pd.DataFrame().append(test)

Returns

    0              1
0  a1  numPresent_a1
1   a        noNum_a
2  a2  numPresent_a2
3   b        noNum_b
4  b2  numPresent_b2
5  b3  numPresent_b3
6   c        noNum_c
7  c1  numPresent_c1
8  c2  numPresent_c2

Try this:

df = df.set_index(0).loc[mylist].reset_index()

prints:

   0              1
0  a1  numPresent_a1
1   a        noNum_a
2  a2  numPresent_a2
3   b        noNum_b
4  b2  numPresent_b2
5  b3  numPresent_b3
6   c        noNum_c
7  c1  numPresent_c1
8  c2  numPresent_c2

I would begin by changing the list to something that has the values and their order. It can be a dataframe which will automatically add indexes or it can be a list with their positions built into it.

import pandas as pd

mylist = ['a1','a','a2','b','b2','b3','c','c1','c2']
newlist=[]
counter=0
for i in mylist:
    counter+=1
    newlist.append((counter,i))
newlist

Another way would be to use an else statement instead of two if statemnts. This code below is complete and works.

mylist = ['a1','a','a2','b','b2','b3','c','c1','c2']
import re
nums=[]
for i in mylist:
    counter+=1
    if re.compile(r'\d').search(i):
        nums.append([i,'numPresent_'+i])
    else: 
        nums.append([i,'noNum_'+i])
df = pd.DataFrame(nums)
df

Rather than splitting your list, make a function that returns the thing that you want to put in your dataframe. In your example, that's something like

 def process(x):
     prefix = 'numPresent_' if any(map(str.isdigit, x)) else 'noNum_'
     return [i, prefix + i]

Now you can make whatever you want in the list:

pd.DataFrame([process(x) for x in mylist])

Alternatively, you can use df.apply after creating a one-column dataframe from myslist . In this case, you can even mask off parts of the column to apply different types of processing faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM