[英]How to match values from one column to second column with multiple values
I have a dataframe: 我有一个数据框:
Name Dept
abc Genteic|Biology|Chemical Engineering
def Physics|Chemical Engineering|Astrophysics
xyz Chemical Engineering|Astrophysics
klm Biology|Astrophysics
nop Chemical Engineering|Astrophysics
The first column contains name and second column shows the various departments they are associated with. 第一列包含名称,第二列显示与之关联的各个部门。 I want to know number of people working in each department.
我想知道每个部门工作的人数。 For ex: In biology dept how many people are associated with.
例如:在生物学部门,有多少人与之相关。 The code i have so for is :
我这样做的代码是:
import pandas as pd
import json
import requests
from requests.exceptions import ConnectionError
from requests.exceptions import ReadTimeout
import csv
def author_name(dataframe):
response = get_url(term)
return response
def get_url(term):
print(term)
response = resp.content
data = json.loads(response)
print(data)
try:
if data['author-retrieval-response']['subject-areas']['subject-area'] != 'null':
myvar = data['author-retrieval-response']['subject-areas']['subject-area']['@abbrev']
myvar = myvar.split('|')
else:
data['author-retrieval-response']['subject-areas']['subject-area'] = 'null'
auth_empty = data['author-retrieval-response']['subject-areas']['subject-area']['@abbrev']
print(auth_empty)
except:
pass
if __name__ =='__main__':
out = open('out.csv', 'w',encoding='utf-8', newline="\n")
csvwriter = csv.writer(out)
header = ['Scopus ID', 'Title', 'Abstract', 'Affilaition', 'Authors',
'Citation', 'Pub_Date']
dataframe = pd.read_csv('author.csv', usecols='auth_name')
for i, row in dataframe.iterrows():
term = (str(row[0]))
response = author_name(dataframe)
csvwriter.writerow(response)
Any help will be greatly appreciated. 任何帮助将不胜感激。 Thanks !!
谢谢 !!
I wrote you a very simple pythonscript that does, what I think you want it to do. 我给您编写了一个非常简单的python脚本,它确实可以满足您的要求。 I ignored that fact that the inputfile is a csv-file, and that there do exist libraries for parsing it.
我忽略了输入文件是一个csv文件的事实,并且确实存在用于对其进行解析的库。 The following is just a quick and dirty solution, to hint you into the right direction.
以下只是一个快速而肮脏的解决方案,以提示您正确的方向。 I would recommend you to improve this snippet:
我建议您改进此代码段:
input.csv input.csv
abc Genteic|Biology|Chemical Engineering
def Physics|Chemical Engineering|Astrophysics
xyz Chemical Engineering|Astrophysics
klm Biology|Astrophysics
nop Chemical Engineering|Astrophysics
main.py main.py
counters = {"Biology":0, "Genteic":0, "Chemical Engineering":0, "Physics":0, "Astrophysics":0}
csv_file = open("input.csv", "r")
for line in csv_file.read().splitlines():
arr=line.split(" ")
name=arr[0]
professions=arr[1]
for subj in professions.split("|"):
counters[subj] += 1
csv_file.close()
print("There are %s teachers working in Biology" % counters["Biology"])
print("There are %s teachers working in Genteic" % counters["Genteic"])
print("There are %s teachers working in Chemical Engineering" % counters["Chemical Engineering"])
print("There are %s teachers working in Physics" % counters["Physics"])
print("There are %s teachers working in Astrophysics" % counters["Astrophysics"])
call of python3 main.py
results in: python3 main.py
调用导致:
There are 2 teachers working in Biology
There are 1 teachers working in Genteic
There are 4 teachers working in Chemical Engineering
There are 1 teachers working in Physics
There are 4 teachers working in Astrophysics
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.