[英]Check if a string in a list is between two other strings in a list?
import pandas as pd
nameBank = ["John Doe", "Jane Doe", "Patrick Star", "Spongebob Squarepants"]
phoneList = []
nameList = []
list1 = ["1234567890", "John doe", "Not a NAME/USELESS FILLERINFO", "2345678901", "jane doe", "Not a NAME/USELESS FILLERINFO", "Not a NAME/USELESS FILLERINFO", "3456789012", "4567890123", "5678901234", "patrick star", "6789012345"]
df = pd.DataFrame({'Phone Number': phoneList, 'Name': nameList})
df.to_csv('results.csv', index=False, encoding='utf-8')
print(df)
我想要做的是从此 list1 中检索每个电话号码并将其放入phoneList
。
从那里我想查看nameBank
中是否有一个名称位于列表中当前电话号码之后和列表中下一个电话号码之前。
如果电话号码后面有名字,那么我希望能够将 append 到nameList
,如果电话号码后面没有名字,那么我想 append “No Name Found”到nameList
。 因此它基本上可以对应于 excel 图表。
即电话号码1234567890
在两个列表之间具有与其对应的名称 John Doe。 第二个电话号码附加了姓名 Jane Doe,因此当您使用这两个列表使用 pandas 创建表时,它们将对应。 第三个电话号码3456789012
在其自身和列表中的下一个电话号码之间没有名称,因此我希望 nameList 的附加值是"no name found"
。
基本上 output 表看起来像:
因此,您想将 list1 解析为一个系列:
list1 = ["1234567890", "John doe", "Not a NAME/USELESS FILLERINFO", "2345678901", "jane doe", "Not a NAME/USELESS FILLERINFO", "Not a NAME/USELESS FILLERINFO", "3456789012", "4567890123", "5678901234", "patrick star", "6789012345"]
import re
num = re.compile('\d{10}')
output = {}
i = 0
while i < len(list1):
if not num.match(list1[i]):
i += 1
continue
output[list1[i]] = list1[i+1] if i+1<len(list1) and not num.match(list1[i+1]) else 'not found'
i += 1
series = pd.Series(output)
Output:
1234567890 John doe
2345678901 jane doe
3456789012 not found
4567890123 not found
5678901234 patrick star
6789012345 not found
dtype: object
import pandas as pd
nameBank = ["John Doe", "Jane Doe", "Patrick Star", "Spongebob Squarepants"]
list1 = ["1234567890", "John doe", "Not a NAME/USELESS FILLERINFO", "2345678901", "jane doe", "Not a NAME/USELESS FILLERINFO", "Not a NAME/USELESS FILLERINFO", "3456789012", "4567890123", "5678901234", "patrick star", "6789012345"]
data = []
for index, elem in enumerate(list1):
if elem.isnumeric():
if (len(list1) - 1) > index:
if list1[index+1].casefold() in map(str.casefold, nameBank):
data.append([elem,list1[index+1].title()])
else:
data.append([elem, 'No Name Found'])
else:
data.append([elem, 'No Name Found'])
df = pd.DataFrame(data, columns=['Phone Number', 'Name'])
# df.to_csv('results.csv', index=False, encoding='utf-8'
print(df)
output:
Phone Number Name
0 1234567890 John Doe
1 2345678901 Jane Doe
2 3456789012 No Name Found
3 4567890123 No Name Found
4 5678901234 Patrick Star
5 6789012345 No Name Found
import re
import pandas as pd
list1 = ["1234567890", "John doe", "Not a NAME/USELESS FILLERINFO", "2345678901", "jane doe", "Not a NAME/USELESS FILLERINFO", "Not a NAME/USELESS FILLERINFO", "3456789012", "4567890123", "5678901234", "patrick star", "6789012345"]
nameBank = ["John Doe", "Jane Doe", "Patrick Star", "Spongebob Squarepants"]
def mapList(list1):
output = []
for index, item in enumerate(list1, start=0):
if re.match("^\d{10}", item):
# Use any one condition
# if index < len(list1) - 1 and list1[index + 1] in nameBank:
if index < len(list1) - 1 and not re.match("^\d{10}", list1[index + 1]):
output.append([list1[index], list1[index+1]]);
else:
output.append([list1[index],'No Name Found']);
return output;
df = pd.DataFrame(mapList(list1), columns=['Phone Number', 'Name'])
print(df)
Output:
Phone Number Name
0 1234567890 John doe
1 2345678901 jane doe
2 3456789012 No Name Found
3 4567890123 No Name Found
4 5678901234 patrick star
5 6789012345 No Name Found
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.