Python pandas dataframe：在数组列中，如果第一项包含特定字符串，则从数组中删除该项

Question

I have a dataframe which has some column like below which contains arrays of different sizes: 我有一个数据框，其中有一些类似下面的列，其中包含不同大小的数组：

column
["a_id","b","c","d"]
["d_ID","e","f"]
["h","i","j","k","l"]
["id_m","n","o","p"]
["ID_q","r","s"]

I want to remove first item from the array of every row if the first item contains "ID" or "id". 如果第一项包含“ ID”或“ id”，我想从每一行的数组中删除第一项。 So, expected output will look like: 因此，预期输出将如下所示：

column
["b","c","d"]
["e","f"]
["h","i","j","k","l"]
["n","o","p"]
["r","s"]

How do we check for this in the column containing array elements in the dataframe? 我们如何在数据框中包含数组元素的列中进行检查？

Answer 1

Edit: It seems I misread your question. 编辑：看来我误解了你的问题。 This solution is meant to remove any element that has 'id' in it, not just the first. 此解决方案旨在删除其中具有'id' 任何元素，而不仅仅是第一个。

Option 1 选项1
I believe the most straightforward solution is using apply : 我相信最直接的解决方案是使用apply ：

df

               col
0  [a_id, b, c, d]
1     [d_ID, e, f]
2  [h, i, j, k, l]
3  [id_m, n, o, p]
4     [ID_q, r, s]


df.col = df.col.apply(lambda y: (y[1:] if 'id' in y[0].lower() else y))

df
               col
0        [b, c, d]
1           [e, f]
2  [h, i, j, k, l]
3        [n, o, p]
4           [r, s]

Option 2 选项2
Alternatively, use a list comprehension : 或者，使用列表推导 ：

df.col = [(y[1:] if 'id' in y[0].lower() else y)  for y in df.col]  

df

               col
0        [b, c, d]
1           [e, f]
2  [h, i, j, k, l]
3        [n, o, p]
4           [r, s]

Timings 时机

df = pd.concat([df] * 100000)

%%timeit
m = df['col'].str[0].str.contains('ID', case=False)
df['col'].mask(m, df['col'].str[1:])

1 loop, best of 3: 917 ms per loop

%timeit [(y[1:] if 'id' in y[0].lower() else y)  for y in df.col]  
1 loop, best of 3: 272 ms per loop

%timeit df.col.apply(lambda y: (y[1:] if 'id' in y[0].lower() else y))
1 loop, best of 3: 309 ms per loop

Answer 2

Use str[0] for select first values in list and then check ID by contains : 使用str[0]在列表中选择第一个值，然后通过contains检查ID ：

m = df['column'].str[0].str.contains('ID', case=False)
print (m)
0     True
1     True
2    False
3     True
4     True
Name: column, dtype: bool

And then remove it by mask with str[1:] : 然后使用str[1:]通过mask将其删除：

df['column'] = df['column'].mask(m, df['column'].str[1:])
print (df)
            column
0        [b, c, d]
1           [e, f]
2  [h, i, j, k, l]
3        [n, o, p]
4           [r, s]

Python pandas dataframe：在数组列中，如果第一项包含特定字符串，则从数组中删除该项

问题描述

2 个解决方案

解决方案1
4 2017-11-08 06:38:01

解决方案2
3 已采纳 2017-11-08 06:41:15

Python pandas dataframe：在数组列中，如果第一项包含特定字符串，则从数组中删除该项

问题描述

2 个解决方案

解决方案1 4 2017-11-08 06:38:01

解决方案2 3 已采纳 2017-11-08 06:41:15

解决方案1
4 2017-11-08 06:38:01

解决方案2
3 已采纳 2017-11-08 06:41:15