Python string split count

Counting how many words each line has in at text file with Python (using str.split)

I have two files, one for input which is «our_input.txt» (same directory as the code file), which is Oscar Wild’s Dorian Gray. Anyway, there’s also an output file, and I need to open the original file, and let Python count how many words each line has, and write in the output. I tried, but I got lost.

what did you try? can you show code that doesn’t work? there’s always a starting point — keep trying!

3 Answers 3

You can try something like this.

First you read your input file:

with open('our_input.txt') as f: lines = f.readlines() 

Then you count the number of words per line and write to the output file:

with open('our_output.txt', 'w') as f: for index, value in enumerate(lines): number_of_words = len(value.split()) f.write('Line number <> has <> words.\n'.format(index + 1, number_of_words)) 

Thanks! I appreciate it, though I think the intention was to use something a bit more simple. if there’s something like that?

You should be using list, not dicts for this. Plus, you ca’ delete the list and just append to the file on the loop

You will need to to iterate over each line of the input text file. That’s done with a standard for loop. You can after split each line at each space char, and count with len() the number of elements in the list. You append this to the output file and you are done

Читайте также:  Run typescript in browser

Yeah, well, I thiknk that was the intention, but how do I do this using split and counting how many words in each line?

A simple technique in any language for word counting in files is:

  1. Read file into a variable.
  2. Replace unnecessary characters such as carriage returns or line feeds with space character. Trim space characters from beginning and end of string.
  3. Replace multiple space characters with single.

We now have a string with words separated by single spaces.

  • Use the language’s split function with space as the delimiter, to produce an array. The number of words is the array length, adjusted for the lower bound of the array being zero or 1 in the language in use.
  • If the language has a count-character-of-specified-type function then use that to count the number of spaces in the string. Add 1. This is the number of words.

The size of the file being worked upon could make this a weighty job for the processor and performance will depend on how the language handles strings and arrays.

If you are working client-server or the text is stored in a database consider the high network cost of moving the string. Better to run the count as close to the data location as possible. So if using an RDBMS use a stored procedure — faster to count words in a 2Gb string and ship an int variable with the answer out to the client than to ship the 2Gb string and count in a web browser.

If you cannot read the entire file in one pass then you can read line-by-line and carry out the above techniques per line. However, due to string handling and loop-running overhead, performance will be faster if you can process the entire file as one string.

Источник

.split python word count

I need to count the words in a sentence. For example, «I walk my dog.» Would be 4 words, but «I walk my 3 dogs» would only be 4 words because numbers are not words. The code can only count alphabetic words. I understand how to count words by simply using the following:

but this doesn’t account for numbers. Is there a simply way (for a beginner) to account for numbers, symbols, etc? thank you.

What about «I walk my Beagle-Harrier»? Is that a possible sentence of four words? (I ask because it will break some isalpha() approaches.)

What if I spelled 4 as four instead? The meaning of the sentence has not changed! 4 is a word too in your sentence.

5 Answers 5

totalWords = sum(1 for word in line.split() if word.isalpha()) 

You can use split function on the line to split it based on spaces. And then check if each word has only alphabets using isalpha function. If it is true, then include 1. Sum all of them at the end.

if there anyway of doing this without using .isalpha that you can think of? Somehow make a list of your string or something?

if not word.isdigit() would allow «don’t» and ‘back-to-back’ to be counted as words. Of course, ‘3rd’ and ‘3.145’ would count as words. Perhaps if not word[0].isdigit() would be better. But you would still have a problem with «‘3» in «‘3 cats are missing,’ she said.» Least problematic would be if any(c.isalpha() for c in word) .

@StevenRumbalski Yup. I thought of all the cases for which this could fail. But its not very clear what does he exactly expect. 🙁

import re lines = [ 'I walk by dog', 'I walk my 3 dogs', 'I walk my Beagle-Harrier' # DSM's example ] for line in lines: words = re.findall('[a-z-]+', line, flags=re.I) print line, '->', len(words), words # I walk by dog -> 4 ['I', 'walk', 'by', 'dog'] # I walk my 3 dogs -> 4 ['I', 'walk', 'my', 'dogs'] # I walk my Beagle-Harrier -> 4 ['I', 'walk', 'my', 'Beagle-Harrier'] 

You can use .isalpha() on strings.

len([word for word in sentence.split() if word.isalpha()]) 

If you don’t want to use .isalpha

sum(not word.isdigit() for word in line.split()) 

This will return True for each word that is not a number, and False for each word that is a number. This code takes advantage of the fact that in python, True == 1 and False == 0 , so you will get the number of non-number words.

If you are uncomfortable with using the int -ness of bool s, you can make it explicit to the reader of your code by adding the int function (this is 100% not needed, but can make the code clearer if you like it that way)

sum(int(not word.isdigit()) for word in line.split()) 

Источник

Split and Count a Python String [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

stringvar = "one;two;three;four" 
stringcount = 4 string1 = "one" string2 = "two" string3 = "three" string4 = "four" 

Sometimes there will be more sometimes less and of course the values could be whatever. I am looking to split a string at the ‘;’ into separate variables and then have another variable that gives the number of variables. thanks

This question appears to be off-topic because it is about doing something genuinely wrong and evil in python.

5 Answers 5

Okay, now that we have that out of the way, here’s how you do that.

stringvar = "one;two;three;four" lst = stringvar.split(";") stringcount = len(lst) for idx, value in enumerate(lst): globals()["string"+str(idx+1)] = value # This is the ugliest code I've ever had to write # please never do this. Please never ever do this. 

globals() returns a dictionary containing every variable in the global scope with the variable name as a string for keys and the value as, well, values.

>>> foo = "bar" >>> baz = "eggs" >>> spam = "have fun stormin' the castle" >>> globals() , '__builtins__': , 'foo': 'bar', '__doc__': None, '__package__': None, 'spam': "have fun stormin' the castle", '__name__': '__main__'> 

You can reference this dictionary to add new variables by string name ( globals()[‘a’] = ‘b’ sets variable a equal to «b» ), but this is generally a terrible thing to do. Think of how you could possibly USE this data! You’d have to bind the new variable name to ANOTHER variable, then use that inside globals()[NEW_VARIABLE] every time! Let’s use a list instead, shall we?

Источник

Split string by count of characters

I can’t figure out how to do this with string methods: In my file I have something like 1.012345e0070.123414e-004-0.1234567891.21423. which means there is no delimiter between the numbers. Now if I read a line from this file I get a string like above which I want to split after e.g. 12 characters. There is no way to do this with something like str.split() or any other string method as far as I’ve seen but maybe I’m overlooking something? Thx

8 Answers 8

Since you want to iterate in an unusual way, a generator is a good way to abstract that:

def chunks(s, n): """Produce `n`-character chunks from `s`.""" for start in range(0, len(s), n): yield s[start:start+n] nums = "1.012345e0070.123414e-004-0.1234567891.21423" for chunk in chunks(nums, 12): print chunk 
1.012345e007 0.123414e-00 4-0.12345678 91.21423 

(which doesn’t look right, but those are the 12-char chunks)

>>> x = "1.012345e0070.123414e-004-0.1234567891.21423" >>> x[2:10] '012345e0' 
line = "1.012345e0070.123414e-004-0.1234567891.21423" firstNumber = line[:12] restOfLine = line[12:] print firstNumber print restOfLine 
1.012345e007 0.123414e-004-0.1234567891.21423 
step = 12 for i in range(0, len(string), 12): slice = string[i:step] step += 12 

in this way on each iteration you will get one slice of 14 characters.

from itertools import izip_longest def grouper(n, iterable, padvalue=None): return izip_longest(*[iter(iterable)]*n, fillvalue=padvalue) 

I stumbled on this while looking for a solution for a similar problem — but in my case I wanted to split string into chunks of differing lengths. Eventually I solved it with RE

In [13]: import re In [14]: random_val = '07eb8010e539e2621cb100e4f33a2ff9' In [15]: dashmap=(8, 4, 4, 4, 12) In [16]: re.findall(''.join('(\S>>)'.format(l) for l in dashmap), random_val) Out[16]: [('07eb8010', 'e539', 'e262', '1cb1', '00e4f33a2ff9')] 

For those who may find it interesting — I tried to create pseudo-random ID by specific rules, so this code is actually part of the following function

import re, time, random def random_id_from_time_hash(dashmap=(8, 4, 4, 4, 12)): random_val = '' while len(random_val) < sum(dashmap): random_val += ''.format(hash(time.time() * random.randint(1, 1000))) return '-'.join(re.findall(''.join('(\S>>)'.format(l) for l in dashmap), random_val)[0]) 

I always thought, since string addition operation is possible by a simple logic, may be division should be like this. When divided by a number, it should split by that length. So may be this is what you are looking for.

class MyString: def __init__(self, string): self.string = string def __div__(self, div): l = [] for i in range(0, len(self.string), div): l.append(self.string[i:i+div]) return l >>> m = MyString(s) >>> m/3 ['abc', 'bdb', 'fbf', 'bfb'] >>> m = MyString('abcd') >>> m/3 ['abc', 'd'] 

If you don’t want to create an entirely new class, simply use this function that re-wraps the core of the above code,

>>> def string_divide(string, div): l = [] for i in range(0, len(string), div): l.append(string[i:i+div]) return l >>> string_divide('abcdefghijklmnopqrstuvwxyz', 15) ['abcdefghijklmno', 'pqrstuvwxyz'] 

Источник

Оцените статью