简体   繁体   中英

How to find the distinct number of elements in data frame column, in which strings contain multiple elements separated by a semi-colon

I'm importing some data from excel and trying to make a dashboard on streamlit. Right now, I'm trying to count the number of distinct elements in one of the columns of a data frame referred to as 'Tags'. However, for some of the rows, I have distinct values that are combined into a single string, rather than multiple strings.

With the first 'for loop' the data came out like this... "Python; C++" "Java; Python" "R; C; Java"

Instead of like... [Python, C++, Java, R, C]. With the second 'for loop,' I'm attempting to do what I want, however, the program outputs nothing. What am I doing wrong?

cnt=0
visited=[]
for i in range(0, len(df1['Tags'])):
    
    if df1['Tags'][i] not in visited: 
        
        visited.append(df1['Tags'][i])
          
        cnt += 1
u=[]
for j in range(0, len(visited)):
    new= visited[j].split(';')
    for z in range(0, len(new)):
        if new not in u:
            u.append(new)
st.write(new)

是你想要的结果吗?

list(set([j.strip() for i in df1["Tags"] for j in i.split(';')]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM