[英]Restoring original order of python list after processing?
Note: This is a toy example that will hopefully illustrate what I am trying to achieve.
注意:这是一个玩具示例,希望能说明我想要实现的目标。
I have a list of strings that I separate into two sub-lists in order to perform different preprocessing steps on.我有一个字符串列表,我将其分成两个子列表,以便对其执行不同的预处理步骤。 Lets say I have the following list of strings:
假设我有以下字符串列表:
mylist = ['a1','a','a2','b','b2','b3','c','c1','c2']
For simplicity, I want to add a particular sub-string to the beginning of each element depending on whether that element contains a number (in reality, I have a multiple preprocessing steps that necessitates splitting the original list):为简单起见,我想在每个元素的开头添加一个特定的子字符串,具体取决于该元素是否包含数字(实际上,我有多个预处理步骤需要拆分原始列表):
import re
withNum = [[i,'numPresent_'+i] for i in mylist if re.compile(r'\d').search(i)]
noNum = [[i,'noNum_'+i] for i in mylist if not re.compile(r'\d').search(i)]
Now that I have my two sub-lists, how can I combine them in a data-frame in a manner that they reflect their original order?既然我有两个子列表,我如何将它们组合在一个数据框中以反映它们的原始顺序? Clearly, if I use
df.append
it will simply stack one on top of the other...显然,如果我使用
df.append
它只会将一个堆叠在另一个之上......
df = pd.DataFrame().append(withNum).append(noNum)
Returns:
回报:
-------------------------
0 1
-------------------------
a1 numPresent_a1
a2 numPresent_a2
b2 numPresent_b2
b3 numPresent_b3
c1 numPresent_c1
c2 numPresent_c2
a noNum_a
b noNum_b
c noNum_c
--------------------------
How can I re-order the data-frame so that it reflects the order of the original list?如何重新排序数据框以反映原始列表的顺序?
Intended Outcome:
预期结果:
-------------------------
0 1
-------------------------
a1 numPresent_a1
a noNum_a
a2 numPresent_a2
b noNum_b
b2 numPresent_b2
b3 numPresent_b3
c noNum_c
c1 numPresent_c1
c2 numPresent_c2
--------------------------
I cannot rely on the content of the string itself to inform its position (so sorting alphabetically is out).我不能依靠字符串本身的内容来通知它的 position (所以按字母顺序排序已经出局了)。 I can only rely on its original position in the list.
我只能依靠它原来的position在列表中。 I'm hoping there is someway I can create an index that I can sort by after I have merged the two sub-lists.
我希望在我合并两个子列表后,我可以创建一个可以排序的索引。
You could modify your list comprehension as follows:您可以按如下方式修改您的列表理解:
test = [[i,'numPresent_'+i] if re.compile(r'\d').search(i) else [i,'noNum_'+i] for i in mylist]
df = pd.DataFrame().append(test)
Returns退货
0 1
0 a1 numPresent_a1
1 a noNum_a
2 a2 numPresent_a2
3 b noNum_b
4 b2 numPresent_b2
5 b3 numPresent_b3
6 c noNum_c
7 c1 numPresent_c1
8 c2 numPresent_c2
Try this:尝试这个:
df = df.set_index(0).loc[mylist].reset_index()
prints:印刷:
0 1
0 a1 numPresent_a1
1 a noNum_a
2 a2 numPresent_a2
3 b noNum_b
4 b2 numPresent_b2
5 b3 numPresent_b3
6 c noNum_c
7 c1 numPresent_c1
8 c2 numPresent_c2
I would begin by changing the list to something that has the values and their order.我将首先将列表更改为具有值及其顺序的内容。 It can be a dataframe which will automatically add indexes or it can be a list with their positions built into it.
它可以是自动添加索引的 dataframe,也可以是内置位置的列表。
import pandas as pd
mylist = ['a1','a','a2','b','b2','b3','c','c1','c2']
newlist=[]
counter=0
for i in mylist:
counter+=1
newlist.append((counter,i))
newlist
Another way would be to use an else statement instead of two if statemnts.另一种方法是使用 else 语句而不是两个 if 语句。 This code below is complete and works.
下面的代码是完整的并且有效。
mylist = ['a1','a','a2','b','b2','b3','c','c1','c2']
import re
nums=[]
for i in mylist:
counter+=1
if re.compile(r'\d').search(i):
nums.append([i,'numPresent_'+i])
else:
nums.append([i,'noNum_'+i])
df = pd.DataFrame(nums)
df
Rather than splitting your list, make a function that returns the thing that you want to put in your dataframe.与其拆分您的列表,不如创建一个 function 来返回您想要放入 dataframe 的内容。 In your example, that's something like
在你的例子中,这就像
def process(x):
prefix = 'numPresent_' if any(map(str.isdigit, x)) else 'noNum_'
return [i, prefix + i]
Now you can make whatever you want in the list:现在您可以在列表中制作任何您想要的内容:
pd.DataFrame([process(x) for x in mylist])
Alternatively, you can use df.apply
after creating a one-column dataframe from myslist
.或者,您可以在从 myslist 创建一列
df.apply
后使用myslist
。 In this case, you can even mask off parts of the column to apply different types of processing faster.在这种情况下,您甚至可以屏蔽部分列以更快地应用不同类型的处理。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.