检查两个差异列表中具有相同索引 position 的两个单词是否在字符串中 - python

Question

我有来自 csv 文件的 2 个列表。

column1 = ZIP 代码
column2 = 城市名称

我有一个字符串； 其中包含一些随机文本，可能还有 zip 代码和城市名称。

如果 column1[i] & colmun2[i] 在字符串中，我想检查每个 i。

我使用了这个解决方案，但只检查来自 1 个列表的单词是否在字符串中。 并且不会从原始列表中返回 position，因此在 column2 之后无法匹配。

我最终使用：

for i in range(39000):
if all(map(lambda w: w in text, (column1[i], column2[i]))):
    print(column1[i], column2[i])

但是对于两个 39000 字的列表，我在 0.30 秒左右，没有任何过程可以让 go 更快。 此解决方案快两倍（0.13 到 0.17 秒），但只搜索 1 个单词...

有任何想法吗？ 谢谢

编辑可重现的示例：

CSV 文件

import pandas as pd

column_names = ["code_commune_insee","nom_de_la_commune", "code_postal","ligne_5","libelle_d_acheminement","coordonnees_gps"]

df = pd.read_csv("laposte_hexasmal.csv", names=column_names, delimiter=';')

column1  = df.code_postal.to_list()
column2 = df.nom_de_la_commune.to_list()

column1_short_version_example = ['48283', '43288', '84389', '403294', '84384', '88439']
column2_short_version_example = ['PARIS', 'Paris', 'London', 'Amsterdam', 'Dublin', 'Manchester'] 
text = 'Hello, yes your order is indeed confirmed for the 14th in our hotel neer PARIS 12, the zip code is 43288 or 75012, if you want to book a night in London two weeks after, we still have room avaible, the postal code for the hotel address is 45 road street 84389'

for i in range(len(column1)):
    if all(map(lambda w: w in text, (column1[i], column2[i]))):
        print(column1[i], column2[i])

短名单版本的理想结果是：

43288 Paris
84389 London

csv guiven 文件列表版本的所需结果是：

75012 PARIS 12

Answer 1

您可以直接遍历项目而不是索引，并使用内置的 zip function 同时遍历两个列表 -

def op(): # this is your solution
    collect = [] # I am collecting into a list instead of print for benchmark
    for i in range(len(column1)):
        if all(map(lambda w: w in text, (column1[i], column2[i]))):
            collect.append((column1[i], column2[i]))
    return collect

def zip_based(): # this is what I am proposing
    collect = [] # I am collecting into a list instead of print for benchmark
    for zipcode, city in zip(column1, column2):
        if zipcode in text and city in text:
            collect.append((zipcode, city))
    return collect

Output YMMV，但我看到了约 3 倍的加速 -

%timeit op()
# 9.93 µs ± 618 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit zip_based()
# 3.01 µs ± 489 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Answer 2

您应该尝试遍历 dataframe 的行

#Suppose df is your dataframe with columns 'ZIP' and 'City'
#Suppose text is your string "that contains some random text, and possibly zip codes and cities names"
for index, row in df.iterrows():
  if(any(row['ZIP'] in text, row['City'] in text)):
   print(f"Row {index} : {row['ZIP']} and {row['City']}")

检查两个差异列表中具有相同索引 position 的两个单词是否在字符串中 - python

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-09-22 09:33:56

解决方案2
0 2022-09-22 08:56:41

检查两个差异列表中具有相同索引 position 的两个单词是否在字符串中 - python

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-09-22 09:33:56

解决方案2 0 2022-09-22 08:56:41

解决方案1
1 已采纳 2022-09-22 09:33:56

解决方案2
0 2022-09-22 08:56:41